Skip to content

RAG (Retrieval-Augmented Generation)

RAG enhances LLM responses by retrieving relevant context from your documents before generating answers.

What is RAG?

Large Language Models have a knowledge cutoff and don't know about your private data. RAG solves this by:

  1. Indexing - Split documents into chunks and convert to vector embeddings
  2. Retrieval - Find relevant chunks based on user query similarity
  3. Generation - Inject context into the prompt for accurate responses

Use RAG when you need the LLM to answer questions about:

  • Product documentation
  • Knowledge bases
  • FAQs and support articles
  • Internal wikis
  • Any domain-specific content

Quick Start

typescript
import { ChatClient, InMemoryDocumentStore, OpenAIEmbeddings } from 'meloqui';

// 1. Create embedding provider
const embeddings = new OpenAIEmbeddings();

// 2. Create document store
const store = new InMemoryDocumentStore({
  embeddingProvider: embeddings
});

// 3. Add your documents
await store.addDocuments([
  { content: 'Password reset: Go to Settings > Security...', source: 'help.md' },
  { content: 'Billing FAQ: You are charged on the 1st...', source: 'faq.md' }
]);

// 4. Create client with RAG enabled
const client = new ChatClient({
  provider: 'openai',
  model: 'gpt-4o',
  documentStore: store
});

// 5. Ask questions - context is automatically retrieved
const response = await client.chat('How do I reset my password?');

Document Stores

InMemoryDocumentStore

Best for development and small datasets. Data is lost when the process exits.

typescript
import { InMemoryDocumentStore, OpenAIEmbeddings } from 'meloqui';

const store = new InMemoryDocumentStore({
  embeddingProvider: new OpenAIEmbeddings(),
  chunkSize: 500,      // Characters per chunk
  chunkOverlap: 50     // Overlap between chunks
});

// Check chunk count (useful for debugging)
console.log(`Chunks in store: ${store.chunkCount}`);

FileDocumentStore

Persists data to disk. Use for production or when you need to preserve the index.

typescript
import { FileDocumentStore, OpenAIEmbeddings } from 'meloqui';

const store = new FileDocumentStore({
  embeddingProvider: new OpenAIEmbeddings(),
  directory: './rag-data'
});

// Optionally pre-load data (called automatically on first operation)
await store.load();

// Data persists across restarts
await store.addDocuments([...]);

Concurrency

FileDocumentStore is not safe for concurrent writes from multiple processes. Data loss may occur if multiple processes call addDocuments() or delete() simultaneously. For concurrent access, use a database-backed store.

Adding Documents

Documents need a content string and source identifier:

typescript
await store.addDocuments([
  {
    content: 'Your document text here...',
    source: 'filename.md',
    metadata: { category: 'billing', version: '2.0' }  // Optional
  }
]);

Batch Loading

Add multiple documents in a single call for better performance. The embedding provider batches API requests automatically.

Managing Documents

typescript
// Delete all chunks from a source
await store.delete('old-file.md');

// Clear entire store
await store.clear();

Chunking Strategies

Documents are split into chunks for embedding. The strategy affects retrieval quality.

Chunk Size

Smaller chunks are more precise but may lack context. Larger chunks provide more context but may include irrelevant information.

typescript
const store = new InMemoryDocumentStore({
  embeddingProvider: embeddings,
  chunkSize: 500,    // Default: 1000
  chunkOverlap: 100  // Default: 200
});
Use CaseChunk SizeOverlap
FAQ/Short answers300-50050
Documentation500-1000100-200
Long-form content1000-2000200-400

Split Boundaries

Use the chunkText utility for advanced control:

typescript
import { chunkText } from 'meloqui';

// Split at sentence boundaries (cleaner chunks)
const chunks = chunkText(document, {
  chunkSize: 500,
  chunkOverlap: 50,
  splitOn: 'sentence'
});

// Split at paragraph boundaries
const chunks = chunkText(document, {
  splitOn: 'paragraph'
});
BoundaryDescription
characterSplit at exact positions (default, may split mid-word)
sentenceSplit at ., ?, ! followed by whitespace
paragraphSplit at double newlines

Embedding Providers

OpenAI Embeddings

Uses text-embedding-3-small by default (1536 dimensions):

typescript
import { OpenAIEmbeddings } from 'meloqui';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,  // Optional, uses env var
  model: 'text-embedding-3-small'       // Default
});

Custom Base URL

Use OpenAI-compatible endpoints (Azure, local models):

typescript
const embeddings = new OpenAIEmbeddings({
  baseUrl: 'https://my-azure.openai.azure.com',
  apiKey: process.env.AZURE_API_KEY,
  dimensions: 768  // Override for models with different dimensions
});

Search and Retrieval

Search the store directly without using ChatClient:

typescript
const results = await store.search('password reset', 3);

for (const result of results) {
  console.log(`Score: ${result.score}`);
  console.log(`Source: ${result.metadata.source}`);
  console.log(`Content: ${result.content}\n`);
}

With ChatClient

Context is automatically retrieved and injected:

typescript
const client = new ChatClient({
  provider: 'openai',
  model: 'gpt-4o',
  documentStore: store,
  ragOptions: {
    topK: 3,        // Retrieve top 3 chunks
    minScore: 0.5   // Only use chunks with score > 0.5
  }
});

const response = await client.chat('How do I reset my password?');

RAG Options

OptionDefaultDescription
topK3Number of chunks to retrieve
minScore0Minimum similarity score (uses cosine similarity: -1 to 1, but relevant results typically score > 0)
contextTemplate(built-in)Custom prompt template
failOnErrorfalseThrow on retrieval failure
onError-Error callback

Error Handling

By default, RAG failures are silent - the LLM responds without context. Configure error handling based on your needs. For comprehensive error handling patterns, see the Error Handling guide.

Silent Degradation (Default)

typescript
const client = new ChatClient({
  documentStore: store,
  ragOptions: {
    // Default: failOnError: false
    onError: (error) => {
      console.warn('RAG retrieval failed:', error.message);
      // Continue without context
    }
  }
});

Fail on Error

typescript
import { RagRetrievalError } from 'meloqui';

const client = new ChatClient({
  documentStore: store,
  ragOptions: {
    failOnError: true
  }
});

try {
  await client.chat('question');
} catch (error) {
  if (error instanceof RagRetrievalError) {
    // Handle retrieval failure
  }
}

Best Practices

Document Preparation

  • Clean your data - Remove boilerplate, navigation, headers/footers
  • Use consistent formatting - Markdown or plain text works best
  • Include context - Each chunk should be understandable on its own
  • Add metadata - Source, category, date for filtering

Chunk Tuning

  • Start with defaults (1000 chars, 200 overlap)
  • Test with real queries from your users
  • Reduce chunk size if answers include irrelevant info
  • Increase overlap if context is being cut off

Performance

  • Use InMemoryDocumentStore for <10,000 chunks
  • Use FileDocumentStore for persistence
  • Consider external vector databases (Pinecone, Weaviate) for large scale

Complete Example

typescript
import {
  ChatClient,
  InMemoryDocumentStore,
  OpenAIEmbeddings,
  RagRetrievalError
} from 'meloqui';

async function main() {
  // Setup
  const store = new InMemoryDocumentStore({
    embeddingProvider: new OpenAIEmbeddings(),
    chunkSize: 500,
    chunkOverlap: 50
  });

  // Index documents
  await store.addDocuments([
    {
      content: `
        Password Reset:
        1. Go to Settings > Security
        2. Click "Reset Password"
        3. Enter current and new password
        4. Click Save
      `,
      source: 'user-guide.md',
      metadata: { category: 'security' }
    },
    {
      content: `
        Billing FAQ:
        Q: When am I charged?
        A: On the 1st of each month.

        Q: How do I cancel?
        A: Go to Settings > Billing > Cancel Plan
      `,
      source: 'faq.md',
      metadata: { category: 'billing' }
    }
  ]);

  // Create RAG-enabled client
  const client = new ChatClient({
    provider: 'openai',
    model: 'gpt-4o',
    documentStore: store,
    ragOptions: {
      topK: 2,
      minScore: 0.3,
      onError: (err) => console.warn('RAG failed:', err.message)
    }
  });

  // Query
  const response = await client.chat('How do I reset my password?');
  console.log(response.content);
}

main();

Next Steps

Released under the MIT License.