Skip to content

Local Models

Meloqui supports running LLMs locally for privacy and offline use. This guide covers the supported local model options.

Options Comparison

OptionProvider TypeSetup ComplexityBest For
OllamaollamaEasyGeneral local inference
Docker Model Runneropenai (compatible)EasyDocker Desktop users
OpenAI-compatible serversopenai (compatible)VariesCustom deployments

Ollama

Ollama is the simplest way to run LLMs locally.

Setup

  1. Install Ollama
  2. Pull a model: ollama pull llama3
  3. Run the server (default localhost:11434)

Usage

typescript
import { ChatClient } from 'meloqui';

const client = new ChatClient({
  provider: 'ollama',
  model: 'llama3'
});

const response = await client.chat('Why is the sky blue?');

Remote Server

If Ollama is on another machine:

typescript
const client = new ChatClient({
  provider: 'ollama',
  model: 'llama3',
  baseUrl: 'http://192.168.1.50:11434'
});

CLI Assistant

bash
npm install -g @meloqui/ollama-assistant
ollama-chat

Or run without a global install:

bash
npx -p @meloqui/ollama-assistant ollama-chat

Docker Model Runner

Docker Model Runner (DMR) is built into Docker Desktop and exposes an OpenAI-compatible API.

Setup

  1. Enable Model Runner in Docker Desktop
  2. Pull a model: docker model pull ai/smollm2

Usage

Since DMR uses an OpenAI-compatible API, use provider: 'openai' with a custom baseUrl:

typescript
import { ChatClient } from 'meloqui';

const client = new ChatClient({
  provider: 'openai',  // OpenAI-compatible API
  model: 'ai/smollm2:latest',
  baseUrl: 'http://localhost:12434/engines/llama.cpp/v1',
  apiKey: 'not-needed'  // DMR doesn't require auth
});

const response = await client.chat('Hello!');

CLI Assistant

bash
npm install -g @meloqui/docker-assistant
docker-chat

Or run without a global install:

bash
npx -p @meloqui/docker-assistant docker-chat

Configuration

Environment VariableDefault
DOCKER_MODELai/smollm2:latest
DOCKER_BASE_URLhttp://localhost:12434/engines/llama.cpp/v1

OpenAI-Compatible Servers

Many local inference servers expose OpenAI-compatible APIs. Use provider: 'openai' with a custom baseUrl.

Examples

LocalAI:

typescript
const client = new ChatClient({
  provider: 'openai',
  model: 'gpt-3.5-turbo',  // Model name depends on your setup
  baseUrl: 'http://localhost:8080/v1',
  apiKey: 'not-needed'
});

vLLM:

typescript
const client = new ChatClient({
  provider: 'openai',
  model: 'meta-llama/Llama-2-7b-chat-hf',
  baseUrl: 'http://localhost:8000/v1',
  apiKey: 'not-needed'
});

LM Studio:

typescript
const client = new ChatClient({
  provider: 'openai',
  model: 'local-model',
  baseUrl: 'http://localhost:1234/v1',
  apiKey: 'lm-studio'
});

Provider Type Reference

ServerProviderNotes
OllamaollamaNative support
Docker Model RunneropenaiOpenAI-compatible API
LocalAIopenaiOpenAI-compatible API
vLLMopenaiOpenAI-compatible API
LM StudioopenaiOpenAI-compatible API
text-generation-webuiopenaiWith OpenAI extension

Privacy

All local model options keep your data on your machine. No requests are sent to external servers.

Released under the MIT License.