Local Models
Meloqui supports running LLMs locally for privacy and offline use. This guide covers the supported local model options.
Options Comparison
| Option | Provider Type | Setup Complexity | Best For |
|---|---|---|---|
| Ollama | ollama | Easy | General local inference |
| Docker Model Runner | openai (compatible) | Easy | Docker Desktop users |
| OpenAI-compatible servers | openai (compatible) | Varies | Custom deployments |
Ollama
Ollama is the simplest way to run LLMs locally.
Setup
- Install Ollama
- Pull a model:
ollama pull llama3 - Run the server (default
localhost:11434)
Usage
typescript
import { ChatClient } from 'meloqui';
const client = new ChatClient({
provider: 'ollama',
model: 'llama3'
});
const response = await client.chat('Why is the sky blue?');Remote Server
If Ollama is on another machine:
typescript
const client = new ChatClient({
provider: 'ollama',
model: 'llama3',
baseUrl: 'http://192.168.1.50:11434'
});CLI Assistant
bash
npm install -g @meloqui/ollama-assistant
ollama-chatOr run without a global install:
bash
npx -p @meloqui/ollama-assistant ollama-chatDocker Model Runner
Docker Model Runner (DMR) is built into Docker Desktop and exposes an OpenAI-compatible API.
Setup
- Enable Model Runner in Docker Desktop
- Pull a model:
docker model pull ai/smollm2
Usage
Since DMR uses an OpenAI-compatible API, use provider: 'openai' with a custom baseUrl:
typescript
import { ChatClient } from 'meloqui';
const client = new ChatClient({
provider: 'openai', // OpenAI-compatible API
model: 'ai/smollm2:latest',
baseUrl: 'http://localhost:12434/engines/llama.cpp/v1',
apiKey: 'not-needed' // DMR doesn't require auth
});
const response = await client.chat('Hello!');CLI Assistant
bash
npm install -g @meloqui/docker-assistant
docker-chatOr run without a global install:
bash
npx -p @meloqui/docker-assistant docker-chatConfiguration
| Environment Variable | Default |
|---|---|
DOCKER_MODEL | ai/smollm2:latest |
DOCKER_BASE_URL | http://localhost:12434/engines/llama.cpp/v1 |
OpenAI-Compatible Servers
Many local inference servers expose OpenAI-compatible APIs. Use provider: 'openai' with a custom baseUrl.
Examples
LocalAI:
typescript
const client = new ChatClient({
provider: 'openai',
model: 'gpt-3.5-turbo', // Model name depends on your setup
baseUrl: 'http://localhost:8080/v1',
apiKey: 'not-needed'
});vLLM:
typescript
const client = new ChatClient({
provider: 'openai',
model: 'meta-llama/Llama-2-7b-chat-hf',
baseUrl: 'http://localhost:8000/v1',
apiKey: 'not-needed'
});LM Studio:
typescript
const client = new ChatClient({
provider: 'openai',
model: 'local-model',
baseUrl: 'http://localhost:1234/v1',
apiKey: 'lm-studio'
});Provider Type Reference
| Server | Provider | Notes |
|---|---|---|
| Ollama | ollama | Native support |
| Docker Model Runner | openai | OpenAI-compatible API |
| LocalAI | openai | OpenAI-compatible API |
| vLLM | openai | OpenAI-compatible API |
| LM Studio | openai | OpenAI-compatible API |
| text-generation-webui | openai | With OpenAI extension |
Privacy
All local model options keep your data on your machine. No requests are sent to external servers.
