RAG (Retrieval-Augmented Generation)

RAG enhances LLM responses by retrieving relevant context from your documents before generating answers.

What is RAG?

Large Language Models have a knowledge cutoff and don't know about your private data. RAG solves this by:

Indexing - Split documents into chunks and convert to vector embeddings
Retrieval - Find relevant chunks based on user query similarity
Generation - Inject context into the prompt for accurate responses

Use RAG when you need the LLM to answer questions about:

Product documentation
Knowledge bases
FAQs and support articles
Internal wikis
Any domain-specific content

Quick Start

typescript

import { ChatClient, InMemoryDocumentStore, OpenAIEmbeddings } from 'meloqui';

// 1. Create embedding provider
const embeddings = new OpenAIEmbeddings();

// 2. Create document store
const store = new InMemoryDocumentStore({
  embeddingProvider: embeddings
});

// 3. Add your documents
await store.addDocuments([
  { content: 'Password reset: Go to Settings > Security...', source: 'help.md' },
  { content: 'Billing FAQ: You are charged on the 1st...', source: 'faq.md' }
]);

// 4. Create client with RAG enabled
const client = new ChatClient({
  provider: 'openai',
  model: 'gpt-4o',
  documentStore: store
});

// 5. Ask questions - context is automatically retrieved
const response = await client.chat('How do I reset my password?');

Document Stores

InMemoryDocumentStore

Best for development and small datasets. Data is lost when the process exits.

typescript

import { InMemoryDocumentStore, OpenAIEmbeddings } from 'meloqui';

const store = new InMemoryDocumentStore({
  embeddingProvider: new OpenAIEmbeddings(),
  chunkSize: 500,      // Characters per chunk
  chunkOverlap: 50     // Overlap between chunks
});

// Check chunk count (useful for debugging)
console.log(`Chunks in store: ${store.chunkCount}`);

FileDocumentStore

Persists data to disk. Use for production or when you need to preserve the index.

typescript

import { FileDocumentStore, OpenAIEmbeddings } from 'meloqui';

const store = new FileDocumentStore({
  embeddingProvider: new OpenAIEmbeddings(),
  directory: './rag-data'
});

// Optionally pre-load data (called automatically on first operation)
await store.load();

// Data persists across restarts
await store.addDocuments([...]);

Concurrency

FileDocumentStore is not safe for concurrent writes from multiple processes. Data loss may occur if multiple processes call addDocuments() or delete() simultaneously. For concurrent access, use a database-backed store.

Adding Documents

Documents need a content string and source identifier:

typescript

await store.addDocuments([
  {
    content: 'Your document text here...',
    source: 'filename.md',
    metadata: { category: 'billing', version: '2.0' }  // Optional
  }
]);

Batch Loading

Add multiple documents in a single call for better performance. The embedding provider batches API requests automatically.

Managing Documents

typescript

// Delete all chunks from a source
await store.delete('old-file.md');

// Clear entire store
await store.clear();

Chunking Strategies

Documents are split into chunks for embedding. The strategy affects retrieval quality.

Chunk Size

Smaller chunks are more precise but may lack context. Larger chunks provide more context but may include irrelevant information.

typescript

const store = new InMemoryDocumentStore({
  embeddingProvider: embeddings,
  chunkSize: 500,    // Default: 1000
  chunkOverlap: 100  // Default: 200
});

Use Case	Chunk Size	Overlap
FAQ/Short answers	300-500	50
Documentation	500-1000	100-200
Long-form content	1000-2000	200-400

Split Boundaries

Use the chunkText utility for advanced control:

typescript

import { chunkText } from 'meloqui';

// Split at sentence boundaries (cleaner chunks)
const chunks = chunkText(document, {
  chunkSize: 500,
  chunkOverlap: 50,
  splitOn: 'sentence'
});

// Split at paragraph boundaries
const chunks = chunkText(document, {
  splitOn: 'paragraph'
});

Boundary	Description
`character`	Split at exact positions (default, may split mid-word)
`sentence`	Split at `.`, `?`, `!` followed by whitespace
`paragraph`	Split at double newlines

Embedding Providers

OpenAI Embeddings

Uses text-embedding-3-small by default (1536 dimensions):

typescript

import { OpenAIEmbeddings } from 'meloqui';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,  // Optional, uses env var
  model: 'text-embedding-3-small'       // Default
});

Custom Base URL

Use OpenAI-compatible endpoints (Azure, local models):

typescript

const embeddings = new OpenAIEmbeddings({
  baseUrl: 'https://my-azure.openai.azure.com',
  apiKey: process.env.AZURE_API_KEY,
  dimensions: 768  // Override for models with different dimensions
});

Search and Retrieval

Direct Search

Search the store directly without using ChatClient:

typescript

const results = await store.search('password reset', 3);

for (const result of results) {
  console.log(`Score: ${result.score}`);
  console.log(`Source: ${result.metadata.source}`);
  console.log(`Content: ${result.content}\n`);
}

With ChatClient

Context is automatically retrieved and injected:

typescript

const client = new ChatClient({
  provider: 'openai',
  model: 'gpt-4o',
  documentStore: store,
  ragOptions: {
    topK: 3,        // Retrieve top 3 chunks
    minScore: 0.5   // Only use chunks with score > 0.5
  }
});

const response = await client.chat('How do I reset my password?');

RAG Options

Option	Default	Description
`topK`	3	Number of chunks to retrieve
`minScore`	0	Minimum similarity score (uses cosine similarity: -1 to 1, but relevant results typically score > 0)
`contextTemplate`	(built-in)	Custom prompt template
`failOnError`	false	Throw on retrieval failure
`onError`	-	Error callback

Error Handling

By default, RAG failures are silent - the LLM responds without context. Configure error handling based on your needs. For comprehensive error handling patterns, see the Error Handling guide.

Silent Degradation (Default)

typescript

const client = new ChatClient({
  documentStore: store,
  ragOptions: {
    // Default: failOnError: false
    onError: (error) => {
      console.warn('RAG retrieval failed:', error.message);
      // Continue without context
    }
  }
});

Fail on Error

typescript

import { RagRetrievalError } from 'meloqui';

const client = new ChatClient({
  documentStore: store,
  ragOptions: {
    failOnError: true
  }
});

try {
  await client.chat('question');
} catch (error) {
  if (error instanceof RagRetrievalError) {
    // Handle retrieval failure
  }
}

Best Practices

Document Preparation

Clean your data - Remove boilerplate, navigation, headers/footers
Use consistent formatting - Markdown or plain text works best
Include context - Each chunk should be understandable on its own
Add metadata - Source, category, date for filtering

Chunk Tuning

Start with defaults (1000 chars, 200 overlap)
Test with real queries from your users
Reduce chunk size if answers include irrelevant info
Increase overlap if context is being cut off

Performance

Use InMemoryDocumentStore for <10,000 chunks
Use FileDocumentStore for persistence
Consider external vector databases (Pinecone, Weaviate) for large scale

Complete Example

typescript

import {
  ChatClient,
  InMemoryDocumentStore,
  OpenAIEmbeddings,
  RagRetrievalError
} from 'meloqui';

async function main() {
  // Setup
  const store = new InMemoryDocumentStore({
    embeddingProvider: new OpenAIEmbeddings(),
    chunkSize: 500,
    chunkOverlap: 50
  });

  // Index documents
  await store.addDocuments([
    {
      content: `
        Password Reset:
        1. Go to Settings > Security
        2. Click "Reset Password"
        3. Enter current and new password
        4. Click Save
      `,
      source: 'user-guide.md',
      metadata: { category: 'security' }
    },
    {
      content: `
        Billing FAQ:
        Q: When am I charged?
        A: On the 1st of each month.

        Q: How do I cancel?
        A: Go to Settings > Billing > Cancel Plan
      `,
      source: 'faq.md',
      metadata: { category: 'billing' }
    }
  ]);

  // Create RAG-enabled client
  const client = new ChatClient({
    provider: 'openai',
    model: 'gpt-4o',
    documentStore: store,
    ragOptions: {
      topK: 2,
      minScore: 0.3,
      onError: (err) => console.warn('RAG failed:', err.message)
    }
  });

  // Query
  const response = await client.chat('How do I reset my password?');
  console.log(response.content);
}

main();

Next Steps

API Reference: RAG - Complete type definitions
Error Handling - Comprehensive error handling patterns
API Reference: Errors - Error type definitions

RAG (Retrieval-Augmented Generation) ​

What is RAG? ​

Quick Start ​

Document Stores ​

InMemoryDocumentStore ​

FileDocumentStore ​

Adding Documents ​

Managing Documents ​

Chunking Strategies ​

Chunk Size ​

Split Boundaries ​

Embedding Providers ​

OpenAI Embeddings ​

Custom Base URL ​

Search and Retrieval ​

Direct Search ​

With ChatClient ​

RAG Options ​

Error Handling ​

Silent Degradation (Default) ​

Fail on Error ​

Best Practices ​

Document Preparation ​

Chunk Tuning ​

Performance ​

Complete Example ​

Next Steps ​

RAG (Retrieval-Augmented Generation)

What is RAG?

Quick Start

Document Stores

InMemoryDocumentStore

FileDocumentStore

Adding Documents

Managing Documents

Chunking Strategies

Chunk Size

Split Boundaries

Embedding Providers

OpenAI Embeddings

Custom Base URL

Search and Retrieval

Direct Search

With ChatClient

RAG Options

Error Handling

Silent Degradation (Default)

Fail on Error

Best Practices

Document Preparation

Chunk Tuning

Performance

Complete Example

Next Steps