RAG (Retrieval-Augmented Generation)
RAG enhances LLM responses by retrieving relevant context from your documents before generating answers.
What is RAG?
Large Language Models have a knowledge cutoff and don't know about your private data. RAG solves this by:
- Indexing - Split documents into chunks and convert to vector embeddings
- Retrieval - Find relevant chunks based on user query similarity
- Generation - Inject context into the prompt for accurate responses
Use RAG when you need the LLM to answer questions about:
- Product documentation
- Knowledge bases
- FAQs and support articles
- Internal wikis
- Any domain-specific content
Quick Start
import { ChatClient, InMemoryDocumentStore, OpenAIEmbeddings } from 'meloqui';
// 1. Create embedding provider
const embeddings = new OpenAIEmbeddings();
// 2. Create document store
const store = new InMemoryDocumentStore({
embeddingProvider: embeddings
});
// 3. Add your documents
await store.addDocuments([
{ content: 'Password reset: Go to Settings > Security...', source: 'help.md' },
{ content: 'Billing FAQ: You are charged on the 1st...', source: 'faq.md' }
]);
// 4. Create client with RAG enabled
const client = new ChatClient({
provider: 'openai',
model: 'gpt-4o',
documentStore: store
});
// 5. Ask questions - context is automatically retrieved
const response = await client.chat('How do I reset my password?');Document Stores
InMemoryDocumentStore
Best for development and small datasets. Data is lost when the process exits.
import { InMemoryDocumentStore, OpenAIEmbeddings } from 'meloqui';
const store = new InMemoryDocumentStore({
embeddingProvider: new OpenAIEmbeddings(),
chunkSize: 500, // Characters per chunk
chunkOverlap: 50 // Overlap between chunks
});
// Check chunk count (useful for debugging)
console.log(`Chunks in store: ${store.chunkCount}`);FileDocumentStore
Persists data to disk. Use for production or when you need to preserve the index.
import { FileDocumentStore, OpenAIEmbeddings } from 'meloqui';
const store = new FileDocumentStore({
embeddingProvider: new OpenAIEmbeddings(),
directory: './rag-data'
});
// Optionally pre-load data (called automatically on first operation)
await store.load();
// Data persists across restarts
await store.addDocuments([...]);Concurrency
FileDocumentStore is not safe for concurrent writes from multiple processes. Data loss may occur if multiple processes call addDocuments() or delete() simultaneously. For concurrent access, use a database-backed store.
Adding Documents
Documents need a content string and source identifier:
await store.addDocuments([
{
content: 'Your document text here...',
source: 'filename.md',
metadata: { category: 'billing', version: '2.0' } // Optional
}
]);Batch Loading
Add multiple documents in a single call for better performance. The embedding provider batches API requests automatically.
Managing Documents
// Delete all chunks from a source
await store.delete('old-file.md');
// Clear entire store
await store.clear();Chunking Strategies
Documents are split into chunks for embedding. The strategy affects retrieval quality.
Chunk Size
Smaller chunks are more precise but may lack context. Larger chunks provide more context but may include irrelevant information.
const store = new InMemoryDocumentStore({
embeddingProvider: embeddings,
chunkSize: 500, // Default: 1000
chunkOverlap: 100 // Default: 200
});| Use Case | Chunk Size | Overlap |
|---|---|---|
| FAQ/Short answers | 300-500 | 50 |
| Documentation | 500-1000 | 100-200 |
| Long-form content | 1000-2000 | 200-400 |
Split Boundaries
Use the chunkText utility for advanced control:
import { chunkText } from 'meloqui';
// Split at sentence boundaries (cleaner chunks)
const chunks = chunkText(document, {
chunkSize: 500,
chunkOverlap: 50,
splitOn: 'sentence'
});
// Split at paragraph boundaries
const chunks = chunkText(document, {
splitOn: 'paragraph'
});| Boundary | Description |
|---|---|
character | Split at exact positions (default, may split mid-word) |
sentence | Split at ., ?, ! followed by whitespace |
paragraph | Split at double newlines |
Embedding Providers
OpenAI Embeddings
Uses text-embedding-3-small by default (1536 dimensions):
import { OpenAIEmbeddings } from 'meloqui';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY, // Optional, uses env var
model: 'text-embedding-3-small' // Default
});Custom Base URL
Use OpenAI-compatible endpoints (Azure, local models):
const embeddings = new OpenAIEmbeddings({
baseUrl: 'https://my-azure.openai.azure.com',
apiKey: process.env.AZURE_API_KEY,
dimensions: 768 // Override for models with different dimensions
});Search and Retrieval
Direct Search
Search the store directly without using ChatClient:
const results = await store.search('password reset', 3);
for (const result of results) {
console.log(`Score: ${result.score}`);
console.log(`Source: ${result.metadata.source}`);
console.log(`Content: ${result.content}\n`);
}With ChatClient
Context is automatically retrieved and injected:
const client = new ChatClient({
provider: 'openai',
model: 'gpt-4o',
documentStore: store,
ragOptions: {
topK: 3, // Retrieve top 3 chunks
minScore: 0.5 // Only use chunks with score > 0.5
}
});
const response = await client.chat('How do I reset my password?');RAG Options
| Option | Default | Description |
|---|---|---|
topK | 3 | Number of chunks to retrieve |
minScore | 0 | Minimum similarity score (uses cosine similarity: -1 to 1, but relevant results typically score > 0) |
contextTemplate | (built-in) | Custom prompt template |
failOnError | false | Throw on retrieval failure |
onError | - | Error callback |
Error Handling
By default, RAG failures are silent - the LLM responds without context. Configure error handling based on your needs. For comprehensive error handling patterns, see the Error Handling guide.
Silent Degradation (Default)
const client = new ChatClient({
documentStore: store,
ragOptions: {
// Default: failOnError: false
onError: (error) => {
console.warn('RAG retrieval failed:', error.message);
// Continue without context
}
}
});Fail on Error
import { RagRetrievalError } from 'meloqui';
const client = new ChatClient({
documentStore: store,
ragOptions: {
failOnError: true
}
});
try {
await client.chat('question');
} catch (error) {
if (error instanceof RagRetrievalError) {
// Handle retrieval failure
}
}Best Practices
Document Preparation
- Clean your data - Remove boilerplate, navigation, headers/footers
- Use consistent formatting - Markdown or plain text works best
- Include context - Each chunk should be understandable on its own
- Add metadata - Source, category, date for filtering
Chunk Tuning
- Start with defaults (1000 chars, 200 overlap)
- Test with real queries from your users
- Reduce chunk size if answers include irrelevant info
- Increase overlap if context is being cut off
Performance
- Use
InMemoryDocumentStorefor <10,000 chunks - Use
FileDocumentStorefor persistence - Consider external vector databases (Pinecone, Weaviate) for large scale
Complete Example
import {
ChatClient,
InMemoryDocumentStore,
OpenAIEmbeddings,
RagRetrievalError
} from 'meloqui';
async function main() {
// Setup
const store = new InMemoryDocumentStore({
embeddingProvider: new OpenAIEmbeddings(),
chunkSize: 500,
chunkOverlap: 50
});
// Index documents
await store.addDocuments([
{
content: `
Password Reset:
1. Go to Settings > Security
2. Click "Reset Password"
3. Enter current and new password
4. Click Save
`,
source: 'user-guide.md',
metadata: { category: 'security' }
},
{
content: `
Billing FAQ:
Q: When am I charged?
A: On the 1st of each month.
Q: How do I cancel?
A: Go to Settings > Billing > Cancel Plan
`,
source: 'faq.md',
metadata: { category: 'billing' }
}
]);
// Create RAG-enabled client
const client = new ChatClient({
provider: 'openai',
model: 'gpt-4o',
documentStore: store,
ragOptions: {
topK: 2,
minScore: 0.3,
onError: (err) => console.warn('RAG failed:', err.message)
}
});
// Query
const response = await client.chat('How do I reset my password?');
console.log(response.content);
}
main();Next Steps
- API Reference: RAG - Complete type definitions
- Error Handling - Comprehensive error handling patterns
- API Reference: Errors - Error type definitions
