Chat Endpoint Configurations

One of the most powerful features of the QvikChat framework is the flexibility and efficiency it provides in configuring chat endpoints. From chat history, response caching, and RAG to authentication, you can configure a chat endpoint with various features by simply specifying the configurations for the endpoint. QvikChat provides all the underlying architecture to support these advanced features, so you can focus on building the chat service you need.

Configurations

Below are some of the chat endpoint configurations you can define.

endpoint: Server endpoint to which queries should be sent to run this chat flow.

LLM Model Configurations

modelConfig: Configuration for the LLM model. This can include parameters like model name, model version, temperature, max output tokens, and safety settings.

View All Model Configurations

Property	Accepted Values	Description
`name`	`gpt4ogemini15flash`	Name of the LLM model to use for the chat agent. If not provided, the default model for the agent type will be used.
`version`	Depends on the model being used	Version of the LLM model to use for the chat agent. If not provided, the latest version of the model will be used.
`temperature`	0.0 to 1.0	Controls the randomness of the output. A higher value will result in more diverse responses.
`maxOutputTokens`	Depends on the model being used	Maximum number of tokens to generate.
`stopSequences`	Array of strings	Sequences to stop generation at.
`safetySettings`	Object	Safety settings for the model.
`size`	`1024x10241792x10241024x1792`	Size of the output image. Supported only by DALL-E models.
`style`	`vividnatural`	Style of the output image. Supported only by DALL-E models.
`quality`	`previewfull`	Quality of the output image. Supported only by DALL-E models.
`response_format`	`b64_jsonurl`	Format of the response. Supported only by DALL-E models.

Prompts

System prompt is used to configure the behavior, tone, and various other characteristics of the Large Language Model (LLM) model, before response generation. A well-structured system prompt designed with security and safety in mind can not only help generate high quality responses, it will also enable mitigation of LLM hallucinations and deterrence of malicious usage attempts (e.g., prompt injection attacks or LLM jailbreak attacks).

systemPrompt: You can override the default system prompt used by QvikChat by providing your own system prompt written using [Dotprompt]. If not provided, the default system prompt for the agent type will be used.
chatPrompt: Chat prompt to use for the chat agent. If not provided, the default chat prompt for the agent type will be used.

Tools

You can add tools to the execution flow of the chat endpoint. From adding simple task executing tools to complex action-taking agent workflows, you can configure the chat endpoint to use the tools you need.

tools: Array of tools to use for the chat agent.

Context-Restricted Chat

You can create a context-restricted chat endpoint by setting the agentType property to close-ended. This will create a close-ended chat endpoint that restricts queries to a specific topic. You must provide a topic for the close-ended chat agent. This is useful when you want to restrict the chat agent to a specific domain or topic. This helps prevent unintended use of the chat service by ignoring context irrelevant queries, for example, a chat service meant to answer user queries related to a product won't respond to queries about the weather.

agentType: Type of chat agent to use for this endpoint. Can set it to close-ended to create a close-ended chat endpoint. By default, it is set to open-ended.
topic: Topic for close-ended chat agent. Required if either agentType is set to close-ended or if RAG is enabled. Queries are restricted to be relevant to the given topic so to prevent unintended use.

Chat History Configurations

For adding the ability for the conversations to be continued, you can add support for chat history to the chat endpoint. To learn more about chat history, check Chat History.

enableChatHistory: Enable chat history for this endpoint. If chat ID is provided, chat history will be fetched and used to generate response. If no chat ID is provided, a new chat ID will be generated to store chat history, and will be returned in the response.
chatHistoryStore: Chat History Store instance to use for this endpoint.

Auth Configurations

For adding authentication to the chat endpoint, you can enable authentication and provide an API Key Store instance. To learn more about authentication, check Authentication.

enableAuth: Enable authentication for this endpoint. Must provide an API Key Store instance if set to true.
apiKeyStore: API Key Store instance to use for this endpoint.

Cache Configurations

To cache responses to frequent queries and reduce response times and costs, you can enable caching for the chat endpoint. To learn more about caching, check Caching.

enableCache: Enable caching for this endpoint. Must provide a Cache Store instance if set to true.
cacheStore: Cache Store instance to use for this endpoint.

RAG Configurations

Retrieval Augmented Generation (RAG) is a powerful technique that combines information retrieval with language generation to provide context-aware responses. You can enable RAG for the chat endpoint and provide a retriever method to retrieve documents for RAG. To learn more about RAG, check the RAG Guide.

Property	Accepted Values	Description
`topic`	String	Topic for RAG chat agent. Required if RAG is enabled. Queries are restricted to be relevant to the given topic so to prevent unintended use.
`enableRAG`	Boolean	Enable RAG (Retrieval Augmented Generation) functionality for this endpoint. Must provide either a retriever method or the retriever configurations if set to true.
`retriever`	Function	Method to retrieve documents for RAG. Can be obtained from the `getDataRetriever` method.
`retrieverConfig`	Object	Configuration for the RAG retriever, for example, number of documents to retrieve, algorithm to use, etc.

Observability & Usage

You can set the verbose property to true to get additional information in the response. This may include usage information (like the number of input and output tokens used, input and output characters, etc.), tools calls information, and request details.

verbose: If set to true, returns additional information in the response. May include usage information (like the number of input and output tokens used, input and output characters, etc.), tools calls information, and request details. By default, it is set to false. Read more.

defineChatEndpoint({
  endpoint: "chat",
  verbose: true,
});

The output produced by a chat endpoint where verbose is enabled, will contain an additional details object. This object may contain the following properties:

type details = {
  usage: {
    inputTokens?: number | undefined;
    outputTokens?: number | undefined;
    totalTokens?: number | undefined;
    inputCharacters?: number | undefined;
    outputCharacters?: number | undefined;
    inputImages?: number | undefined;
    outputImages?: number | undefined;
    inputVideos?: number | undefined;
    outputVideos?: number | undefined;
    inputAudioFiles?: number | undefined;
    outputAudioFiles?: number | undefined;
    custom?: Record<string, number> | undefined;
  },
  tool_requests: {
    {
    toolRequest: {
        name: string;
        ref?: string | undefined;
        input?: unknown;
    };
    data?: unknown;
    text?: undefined;
    media?: undefined;
    metadata?: Record<string, unknown> | undefined;
    toolResponse?: undefined;
    }
  },
  request: // Request details, including messages, data, etc.
}

The details included in the details object come directly through Firebase Genkit (opens in a new tab). The usage object contains information about the number of input and output tokens, characters, images, videos, audio files, and any custom data used in the response. The tool_requests object contains information about the tools called during the response generation. The request object contains the request details, including messages, data, etc.

Response Type

You can set the responseType property to specify the type of response that the endpoint should return. The response type can be text, json, or media. By default, it is set to text.

defineChatEndpoint({
  endpoint: "chat",
  responseType: "json",
});

Please note that, if you are using custom prompts with the endpoint, the output schema of these prompts must match the response type that you configure the endpoint with. Currently, responseType is available only in alpha version and is still under testing. Presently, responses are returned only as strings when using the default system and chat prompts. You can still get responses back as media or JSON, but you will need to manually parse the response.

Chat Agent Config

Under the hood, each chat endpoint uses a ChatAgent to process the query and generate responses. This chat agent has a LLM model specified for response generation, a default system prompt based on agent type, chat prompts, and optionally, any configured tools.

You can use the chatAgentConfig property to override the default configurations for the chat agent. Below are the properties you can set in the chatAgentConfig object:

Please ensure that you have configured the project to use the model, if you specify any specific model name. For using models through the Gemini API or OpenAI API, ensure that you've setup the correct environment variables. For any other models, please ensure you've added the Genkit plugin correctly. For more information on setting up Genkit plugins, check Genkit integration.

Example

Below is an example of a comprehensively configured chat endpoint. You don't need to provide all the configurations, only the ones you need. Below example is only for demonstration purposes.

import { FirestoreAPIKeyStore } from "@oconva/qvikchat/auth";
import { FirestoreCacheStore } from "@oconva/qvikchat/cache";
import { getEmbeddingModel } from "@oconva/qvikchat/data-embeddings";
import { TaskType } from "@oconva/qvikchat/data-retrievers";
import { defineChatEndpoint } from "@oconva/qvikchat/endpoints";
import { FirestoreChatHistoryStore } from "@oconva/qvikchat/history";
import { credential, getFirebaseApp } from "@oconva/qvikchat/firebase";
 
// Initialize Firebase app
const firebaseApp = getFirebaseApp({
  credential: credential.cert(
    process.env.GOOGLE_APPLICATION_CREDENTIALS as string
  ),
});
 
// Define chat endpoint with RAG, chat history, cache and auth
// uses Firestore API key store, Firestore chat history store, Firestore cache store
// uses Gemini 15 Pro model for chat and embedding-001 embedding model
// uses custom retrieval strategy for RAG
defineChatEndpoint({
  endpoint: "chat",
  topic: "inventory data",
  chatAgentConfig: {
    model: "gemini15Pro",
    modelConfig: {
      version: "latest",
      temperature: 0.5,
      maxOutputTokens: 2048,
      safetySettings: [
        {
          category: "HARM_CATEGORY_DANGEROUS_CONTENT",
          threshold: "BLOCK_LOW_AND_ABOVE",
        },
      ],
    },
  },
  enableAuth: true,
  apiKeyStore: new FirestoreAPIKeyStore({
    firebaseApp: firebaseApp,
    collectionName: "api-keys",
  }),
  enableChatHistory: true,
  chatHistoryStore: new FirestoreChatHistoryStore({
    firebaseApp: firebaseApp,
    collectionName: "chat-history",
  }),
  enableCache: true,
  cacheStore: new FirestoreCacheStore({
    firebaseApp: firebaseApp,
    collectionName: "cache",
  }),
  enableRAG: true,
  retrieverConfig: {
    filePath: "data/inventory.csv",
    csvLoaderOptions: {
      column: "products",
      separator: ",",
    },
    generateEmbeddings: true,
    retrievalOptions: {
      k: 10,
      searchType: "mmr",
    },
    embeddingModel: getEmbeddingModel({
      modelName: "embedding-001",
      taskType: TaskType.RETRIEVAL_DOCUMENT,
    }),
  },
});

RAG Chat Testing Endpoints