Chat Endpoint Configurations
One of the most powerful features of the QvikChat framework is the flexibility and efficiency it provides in configuring chat endpoints. From chat history, response caching, and RAG to authentication, you can configure a chat endpoint with various features by simply specifying the configurations for the endpoint. QvikChat provides all the underlying architecture to support these advanced features, so you can focus on building the chat service you need.
Configurations
Below are some of the chat endpoint configurations you can define.
endpoint
: Server endpoint to which queries should be sent to run this chat flow.
LLM Model Configurations
modelConfig
: Configuration for the LLM model. This can include parameters like model name, model version, temperature, max output tokens, and safety settings.
View All Model Configurations
Property | Accepted Values | Description |
---|---|---|
name | gpt4o gemini15flash | Name of the LLM model to use for the chat agent. If not provided, the default model for the agent type will be used. |
version | Depends on the model being used | Version of the LLM model to use for the chat agent. If not provided, the latest version of the model will be used. |
temperature | 0.0 to 1.0 | Controls the randomness of the output. A higher value will result in more diverse responses. |
maxOutputTokens | Depends on the model being used | Maximum number of tokens to generate. |
stopSequences | Array of strings | Sequences to stop generation at. |
safetySettings | Object | Safety settings for the model. |
size | 1024x1024 1792x1024 1024x1792 | Size of the output image. Supported only by DALL-E models. |
style | vivid natural | Style of the output image. Supported only by DALL-E models. |
quality | preview full | Quality of the output image. Supported only by DALL-E models. |
response_format | b64_json url | Format of the response. Supported only by DALL-E models. |
Prompts
System prompt is used to configure the behavior, tone, and various other characteristics of the Large Language Model (LLM) model, before response generation. A well-structured system prompt designed with security and safety in mind can not only help generate high quality responses, it will also enable mitigation of LLM hallucinations and deterrence of malicious usage attempts (e.g., prompt injection attacks or LLM jailbreak attacks).
systemPrompt
: You can override the default system prompt used by QvikChat by providing your own system prompt written using [Dotprompt]. If not provided, the default system prompt for the agent type will be used.chatPrompt
: Chat prompt to use for the chat agent. If not provided, the default chat prompt for the agent type will be used.
Tools
You can add tools to the execution flow of the chat endpoint. From adding simple task executing tools to complex action-taking agent workflows, you can configure the chat endpoint to use the tools you need.
tools
: Array of tools to use for the chat agent.
Context-Restricted Chat
You can create a context-restricted chat endpoint by setting the agentType
property to close-ended
. This will create a close-ended chat endpoint that restricts queries to a specific topic. You must provide a topic
for the close-ended chat agent. This is useful when you want to restrict the chat agent to a specific domain or topic. This helps prevent unintended use of the chat service by ignoring context irrelevant queries, for example, a chat service meant to answer user queries related to a product won't respond to queries about the weather.
agentType
: Type of chat agent to use for this endpoint. Can set it toclose-ended
to create a close-ended chat endpoint. By default, it is set toopen-ended
.topic
: Topic for close-ended chat agent. Required if eitheragentType
is set toclose-ended
or if RAG is enabled. Queries are restricted to be relevant to the given topic so to prevent unintended use.
Chat History Configurations
For adding the ability for the conversations to be continued, you can add support for chat history to the chat endpoint. To learn more about chat history, check Chat History.
enableChatHistory
: Enable chat history for this endpoint. If chat ID is provided, chat history will be fetched and used to generate response. If no chat ID is provided, a new chat ID will be generated to store chat history, and will be returned in the response.chatHistoryStore
: Chat History Store instance to use for this endpoint.
Auth Configurations
For adding authentication to the chat endpoint, you can enable authentication and provide an API Key Store instance. To learn more about authentication, check Authentication.
enableAuth
: Enable authentication for this endpoint. Must provide an API Key Store instance if set to true.apiKeyStore
: API Key Store instance to use for this endpoint.
Cache Configurations
To cache responses to frequent queries and reduce response times and costs, you can enable caching for the chat endpoint. To learn more about caching, check Caching.
enableCache
: Enable caching for this endpoint. Must provide a Cache Store instance if set to true.cacheStore
: Cache Store instance to use for this endpoint.
RAG Configurations
Retrieval Augmented Generation (RAG) is a powerful technique that combines information retrieval with language generation to provide context-aware responses. You can enable RAG for the chat endpoint and provide a retriever method to retrieve documents for RAG. To learn more about RAG, check the RAG Guide.
Property | Accepted Values | Description |
---|---|---|
topic | String | Topic for RAG chat agent. Required if RAG is enabled. Queries are restricted to be relevant to the given topic so to prevent unintended use. |
enableRAG | Boolean | Enable RAG (Retrieval Augmented Generation) functionality for this endpoint. Must provide either a retriever method or the retriever configurations if set to true. |
retriever | Function | Method to retrieve documents for RAG. Can be obtained from the `getDataRetriever` method. |
retrieverConfig | Object | Configuration for the RAG retriever, for example, number of documents to retrieve, algorithm to use, etc. |
Observability & Usage
You can set the verbose
property to true
to get additional information in the response. This may include usage information (like the number of input and output tokens used, input and output characters, etc.), tools calls information, and request details.
verbose
: If set totrue
, returns additional information in the response. May include usage information (like the number of input and output tokens used, input and output characters, etc.), tools calls information, and request details. By default, it is set tofalse
. Read more.
defineChatEndpoint({
endpoint: "chat",
verbose: true,
});
The output produced by a chat endpoint where verbose is enabled, will contain an additional details
object. This object may contain the following properties:
type details = {
usage: {
inputTokens?: number | undefined;
outputTokens?: number | undefined;
totalTokens?: number | undefined;
inputCharacters?: number | undefined;
outputCharacters?: number | undefined;
inputImages?: number | undefined;
outputImages?: number | undefined;
inputVideos?: number | undefined;
outputVideos?: number | undefined;
inputAudioFiles?: number | undefined;
outputAudioFiles?: number | undefined;
custom?: Record<string, number> | undefined;
},
tool_requests: {
{
toolRequest: {
name: string;
ref?: string | undefined;
input?: unknown;
};
data?: unknown;
text?: undefined;
media?: undefined;
metadata?: Record<string, unknown> | undefined;
toolResponse?: undefined;
}
},
request: // Request details, including messages, data, etc.
}
The details included in the details
object come directly through Firebase Genkit (opens in a new tab). The usage
object contains information about the number of input and output tokens, characters, images, videos, audio files, and any custom data used in the response. The tool_requests
object contains information about the tools called during the response generation. The request
object contains the request details, including messages, data, etc.
Response Type
You can set the responseType
property to specify the type of response that the endpoint should return. The response type can be text
, json
, or media
. By default, it is set to text
.
defineChatEndpoint({
endpoint: "chat",
responseType: "json",
});
Please note that, if you are using custom prompts with the endpoint, the output schema of these prompts must match the response type that you configure the endpoint with. Currently,
responseType
is available only in alpha version and is still under testing. Presently, responses are returned only as strings when using the default system and chat prompts. You can still get responses back as media or JSON, but you will need to manually parse the response.
Chat Agent Config
Under the hood, each chat endpoint uses a ChatAgent
to process the query and generate responses. This chat agent has a LLM model specified for response generation, a default system prompt based on agent type, chat prompts, and optionally, any configured tools.
You can use the chatAgentConfig
property to override the default configurations for the chat agent. Below are the properties you can set in the chatAgentConfig
object:
Please ensure that you have configured the project to use the model, if you specify any specific model name. For using models through the Gemini API or OpenAI API, ensure that you've setup the correct environment variables. For any other models, please ensure you've added the Genkit plugin correctly. For more information on setting up Genkit plugins, check Genkit integration.
Example
Below is an example of a comprehensively configured chat endpoint. You don't need to provide all the configurations, only the ones you need. Below example is only for demonstration purposes.
import { FirestoreAPIKeyStore } from "@oconva/qvikchat/auth";
import { FirestoreCacheStore } from "@oconva/qvikchat/cache";
import { getEmbeddingModel } from "@oconva/qvikchat/data-embeddings";
import { TaskType } from "@oconva/qvikchat/data-retrievers";
import { defineChatEndpoint } from "@oconva/qvikchat/endpoints";
import { FirestoreChatHistoryStore } from "@oconva/qvikchat/history";
import { credential, getFirebaseApp } from "@oconva/qvikchat/firebase";
// Initialize Firebase app
const firebaseApp = getFirebaseApp({
credential: credential.cert(
process.env.GOOGLE_APPLICATION_CREDENTIALS as string
),
});
// Define chat endpoint with RAG, chat history, cache and auth
// uses Firestore API key store, Firestore chat history store, Firestore cache store
// uses Gemini 15 Pro model for chat and embedding-001 embedding model
// uses custom retrieval strategy for RAG
defineChatEndpoint({
endpoint: "chat",
topic: "inventory data",
chatAgentConfig: {
model: "gemini15Pro",
modelConfig: {
version: "latest",
temperature: 0.5,
maxOutputTokens: 2048,
safetySettings: [
{
category: "HARM_CATEGORY_DANGEROUS_CONTENT",
threshold: "BLOCK_LOW_AND_ABOVE",
},
],
},
},
enableAuth: true,
apiKeyStore: new FirestoreAPIKeyStore({
firebaseApp: firebaseApp,
collectionName: "api-keys",
}),
enableChatHistory: true,
chatHistoryStore: new FirestoreChatHistoryStore({
firebaseApp: firebaseApp,
collectionName: "chat-history",
}),
enableCache: true,
cacheStore: new FirestoreCacheStore({
firebaseApp: firebaseApp,
collectionName: "cache",
}),
enableRAG: true,
retrieverConfig: {
filePath: "data/inventory.csv",
csvLoaderOptions: {
column: "products",
separator: ",",
},
generateEmbeddings: true,
retrievalOptions: {
k: 10,
searchType: "mmr",
},
embeddingModel: getEmbeddingModel({
modelName: "embedding-001",
taskType: TaskType.RETRIEVAL_DOCUMENT,
}),
},
});