Data Retrieval
When creating a RAG-enabled chat endpoint, we can either use the getDataRetriever
method to obtain an instance of a vector store retriever, or we can directly provide configurations for data retriever to the chat endpoint as the retrieverConfig
.
This may or may not include data ingestion whereby we load the data, generate embeddings and store them in a vector store. To learn about data ingestion, check Data Ingestion.
Retrieving Data
When a RAG-enabled chat endpoint received a query, it uses the provided data retriever to retrieve the relevant context information from the vector store to which the data retriever belongs.
The underlying process is as follows:
- Query Embedding Generation: Each data retriever is attached to a vector store, and a vector store has an embedding model specified for it. This embedding model is used to generate embedding vectors for the given user query.
- Vector Store Search: The generated embedding vectors are used to search the vector store to retrieve the most relevant context information. You can configure your data retriever to use different types of search algorithms, such as similarity matching or Maximal Marginal Relevance (MMR), or specify the number of documents to retrieve. All such configurations will greatly impact the performance of your RAG chat endpoint. We recommend you start with default parameters, but always test with different configurations to find the best one for your use case.
- Conversion to String: The retrieved context information is converted back to a string format so it can be embedded in the query that will be sent to the LLM model. A specific prompt template maybe used to structure the query and context information in a way that the LLM model can understand and generate a meaningful response.
- Response Generation: The query with the context information is sent to the LLM model, which generates a response based on the query and context information.
Retriever Config
The data retriever configuration object is used to specify the data loader, data splitter, and vector store to use. The configuration object can also include options for generating embeddings, search options, and other configurations.
The getDataRetriever
method and the retrieverConfig
property in the chat endpoint configuration accept the same configuration object. Below are the properties that can be used to configure the retriever:
Required properties
filePath
: The path to the file to load the data from.
Optional properties
dataType
: The type of data to load. This helps ascertain the best splitting strategy. When providingfilePath
, specifyingdataType
is optional. If not specified, the data type is inferred from the file extension. If not providingfilePath
, i.e., you are specifyingdocs
orsplitDocs
, you must provide thedataType
, since it cannot be inferred from the file extension.docs
: An array ofDocument
objects containing the data to load. This is useful when you want to load data from a source not supported by QvikChat by default. You can use any LangChain-supported data loader (opens in a new tab) to load the data and provide the documents as thedocs
property. If providingdocs
, you do not need to providefilePath
but you must provide thedataType
.splitDocs
: An array containing documents that have been processed through a data splitter. This is useful when you want to load data from a source not supported by QvikChat by default. You can use any LangChain-supported text splitter (opens in a new tab) to split the data and provide the split documents as thesplitDocs
property. If providingsplitDocs
, you do not need to providedocs
. You must provide thedataType
.jsonLoaderKeysToInclude
: An object containing the keys to include when loading JSON data. This is useful when you want to load only specific keys from the JSON data.csvLoaderOptions
: An object containing options to specify when loading CSV data. This is useful when you want to specify the delimiter and other options when loading CSV data.pdfLoaderOptions
: An object containing options to specify when loading PDF data. This is useful when you want to specify additional options when loading PDF data.dataSplitterType
: Use this to specify a specific text splitter you want to use. If not specified, the default text splitter is used based on the data type.chunkingConfig
: An object containing the data chunking configuration. This is useful when you want to specify the data chunking configuration.splitterConfig
: An object containing the data splitter configuration. Use this to specify or override data splitting strategy.vectorStore
: A vector store instance to use for retrieving relevant context data for a given query. WhengenerateEmbeddings
is set totrue
, the generated embeddings are stored in the vector store. If using a hosted vector store, you can provide the instance to thevectorStore
property. To learn more, check out the Vector Store page.embeddingModel
: An embedding model instance to use for generating embeddings for the data. WhengenerateEmbeddings
is set totrue
, the embeddings are generated using the provided embedding model. If you want to use a specific embedding model, you can provide the instance to theembeddingModel
property. By default, QvikChat will use either Gemini API or OpenAI for embedding model, depending on how you have configured the project. To learn more, check out the Embedding Models page.generateEmbeddings
: A boolean value to specify whether to generate embeddings for the data. If set totrue
, embeddings are generated for the data. If set tofalse
, embeddings are not generated for the data, only the retriever instance is returned.retrievalOptions
: Use this to configure the data retrieval strategy. Check out the Data Retrieval page for more information.
Data Retriever
The getDataRetriever
method is used to obtain a vector store retriever. You can provide configurations as an argument to the method. The method returns a retriever instance that can be used to retrieve context information from the vector store.
Below is an example of a custom configured data retriever:
import { getDataRetriever } from "@oconva/qvikchat/data-retrievers";
// Index inventory data and get retriever
const inventoryDataRetriever = await getDataRetriever({
filePath: "data/knowledge-bases/inventory-data.csv",
generateEmbeddings: true,
retrievalOptions: {
k: 10, // return top 10 matching documents
searchType: "mmr", // use Maximal Marginal Relevance (MMR) for search
},
});
The getDataRetriever
method returns a retriever instance that can be used to retrieve context information from the vector store. This retriever instance can be passed to the chat endpoint configuration as the retriever
property.
import { defineChatEndpoint } from "@oconva/qvikchat/endpoints";
// open-ended RAG chat endpoint
defineChatEndpoint({
enableRAG: true,
topic: "Inventory Data", // topic of data for which RAG is enabled
retriever: inventoryDataRetriever,
});
Endpoint Retriever Config
Instead of creating a data retriever separately, you can also provide the data retriever configuration directly in the chat endpoint configuration. Data retriever will be created using the provided configuration when the chat endpoint is initialized.
Be mindful though that when sharing the same retriever configurations (for example, same data) and the generateEmbeddings
flag is set to true
, the data will be loaded and embeddings will be generated each time any chat endpoint using that retriever configuration is initialized. So, if you have multiple chat endpoints that will be sharing the same retriever config and you are generating embeddings for the data, it is recommended to create a retriever instance separately and share it across multiple chat endpoints as the retriever
instance.
// Inventory Data chat endpoint with support for chat history, authentication, caching and RAG
defineChatEndpoint({
endpoint: "rag-chat-inventory",
enableChatHistory: true,
chatHistoryStore, // chat history store instance
enableAuth: true,
apiKeyStore, // API key store instance
enableCache: true,
cacheStore, // cache store instance
enableRAG: true,
topic: "Store Inventory Data", // topic of data for which RAG is enabled
retrieverConfig: {
filePath: "rag/knowledge-bases/inventory-data.csv",
generateEmbeddings: true,
},
});
Vector Store
The data retriever is attached to a vector store, which is used to store the embeddings generated for the data. The vector store is responsible for storing the embeddings and providing search functionality to retrieve the most relevant context information based on the user query.
By default, QvikChat uses an in-memory vector store. Check Vector Store to learn how to use a hosted vector store supported by LangChain.