Caching
Caching is a very complex problem when it comes to LLM-powered AI conversational AI apps, as there can a myraid of ways any single query can be answered, and the context information can be large and different for each user or conversation.
QvikChat provides a very simple way of caching the responses of the AI models, which is easier to implement and doesn't have a lot of overhead. However, there are some limitations to this approach, which are discussed in the following sections.
Core Concepts
- Cache Store: The cache store is a collection of cache records. It is used to manage cache records, including creating, updating, deleting, and querying cache records.
- Cache Record: Represents a record of a cache. It contains the response of the AI model, the context information, user query, expiration date and time, and a hash of the complete query.
- Query Cache Threshold: This is the number of times that a specific query is received before its response is cached. This is useful to avoid caching responses for queries that are not frequently asked.
- Caching Response: After a query is received more than the specified query cache threshold, the response of the AI model is cached in the cache store.
- Query Hash: Each cached query is stored and retrieved using a hash of the complete query. Complete query includes the user query, context information, and any previous chat history. This makes the storage and retrieval of cache records efficient.
- Cache Expiration: The cache expiration is the time after which the cache record is considered invalid and should be removed from the cache store. This is useful to avoid serving stale responses to the users.
Why hashing the complete query?
Hashing the complete query is essential to retrieve the cache record from the cache store. The complete query includes the user query, context information, and any previous chat history. This is important because the response of the AI model can be different based on the context information and previous chat history. For example, if the user asks "Can I use it for authentication?", it is important to know what "it" refers to, which can be inferred from the previous conversation.
We use hashing here to avoid storing the complete query in the cache store, which can be large and inefficient. Instead, we store the hash of the complete query, which is a fixed-size string, and use it to retrieve the cache record from the cache store.
Hashing is implemented by the CacheStore
class, which is responsible for storing and retrieving cache records. For example, the in-memory cache store uses the sha-256
hashing algorithm to hash the complete query.
Cache Store
The cache store is a collection of cache records. It is used to manage cache records, including creating, updating, deleting, and querying cache records.
You can implement your own cache store by implementing the CacheStore
interface. The following code snippet demonstrates how to create a custom cache store:
import { CacheStore } from "@oconva/qvikchat/cache";
export class CustomCacheStore implements CacheStore {
// Implement the CacheStore interface
}
Cache Store Interface
Below given code gives you an idea of the CacheRecord
and CacheStore
interfaces. Actual implementation may vary depending on the version of QvikChat you are using.
Cache Record Type
/**
* Cache record contains the data of a processed query, along with the expiry date of the record.
*/
export type CacheRecord = {
/** query + chat history */
query: string;
/** cached response */
response?: string;
/** expiry date of the cache record */
expiry?: ExpiryDate;
/** cache threshold. Query data is cached after this threshold reaches zero. Avoids the need to cache all data for every query. */
cacheThreshold: number;
};
Cache Store Interface
/**
* Cache store interface provides methods to manage cache records.
*/
export interface CacheStore {
/** Map containing all cache records */
cache: Map<QueryHash, CacheRecord>;
/** duration after which each record expires */
recordExpiryDuration: number;
/** threshold after which a query is cached, e.g., n=3 means a specific query will be cached if received more than 3 times */
cacheQueryAfterThreshold: number;
/**
* Check if a given query hash exists.
* @param hash - The query hash to check.
* @returns Returns true if the query hash exists in the cache, false otherwise.
*/
queryExists(hash: QueryHash): boolean;
/**
* Add a new query to the cache without caching the response.
* Primarily used to set the cache threshold for a query, i.e., to track the number of times this query is received.
* After the cache threshold is reached, the response will be cached.
* @param query - The query to add.
* @param hash - Hash value of the query, if not provided, it will be generated.
* @returns Returns void.
*/
addQuery(query: string, hash?: string): void;
/**
* Cache the response for a given query hash.
* Automatically sets the expiry date of the record based on cache store configurations.
* @param hash - The query hash to cache the response for.
* @param response - The response to cache.
* @returns Returns true if the response was cached successfully, false otherwise.
*/
cacheResponse(hash: QueryHash, response: string): boolean;
/**
* Add a new cache record with the given query and response.
* Automatically sets the expiry date of the record based on cache store configurations.
* @param query - The query to cache.
* @param response - The response to cache.
*/
addRecord(query: string, response: string): void;
/**
* Get a specific cache record.
* @param hash - The query hash of the cache record to retrieve.
* @returns Returns the cache record if found, undefined otherwise.
*/
getRecord(hash: QueryHash): CacheRecord | undefined;
/**
* Delete a cache record.
* @param hash - The query hash of the cache record to delete.
*/
deleteRecord(hash: QueryHash): void;
/**
* Check if a cache record is expired.
* @param hash - The query hash of the cache record to check.
* @returns Returns true if the cache record is expired, false otherwise.
*/
isExpired(hash: QueryHash): boolean;
/**
* Increment the cache threshold for a query hash.
* @param hash - The query hash to increment the cache threshold for.
*/
decrementCacheThreshold(hash: QueryHash): void;
/**
* Check if the cache threshold for a query hash has reached zero.
* @param hash - The query hash to check.
* @returns Returns true if the cache threshold has reached zero, false otherwise.
*/
isCacheThresholdReached(hash: QueryHash): boolean;
}
In-Memory Cache Store
The in-memory cache store is a simple implementation of the cache store that stores cache records in memory. It is useful for testing and development purposes, but not recommended for production use.
Adding Cache Store to Endpoint Configurations
You can use Firestore as a cache store. To do this, you need to provide the Firestore cache store as the cacheStore
to the endpoint configurations when defining the chat endpoint.
When creating an instance of the FirestoreCacheStore
class, you need to provide the following options:
firebaseApp
: The Firebase app instance.collectionName
: The name of the collection that will store the cache.
The collection storing the cache will contain records that follow the structure of the CacheRecord
interface. For more information, check Cache Store Interface.
import { getFirebaseApp } from "@oconva/qvikchat/firebase";
import { FirestoreCacheStore } from "@oconva/qvikchat/cache";
import { defineChatEndpoint } from "@oconva/qvikchat/endpoints";
// Initialize Firebase App
const firebaseApp = getFirebaseApp();
// Firestore cache store
const firestoreCacheStore = new FirestoreCacheStore({
firebaseApp,
collectionName: "cache", // collection that will store cache
});
// Chat Endpoint with Firestore cache store
defineChatEndpoint({
endpoint: "chat",
enableCache: true,
cacheStore: firestoreCacheStore,
});
Remember to initialize the Firebase app before using FirestoreCacheStore
. To learn more about initializing the Firebase app, check Initialize Firebase App