Skip to content

Vision & Multimodal

Meloqui supports multimodal models (like GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro) that can understand images.

Sending Images

Pass an array of content parts instead of a simple string:

typescript
await client.chat({
  role: 'user',
  content: [
    { type: 'text', text: 'What is in this image?' },
    { type: 'image', image: 'https://example.com/photo.jpg' }
  ]
});

Local Files

Local file paths are automatically base64 encoded:

typescript
await client.chat({
  role: 'user',
  content: [
    { type: 'image', image: './screenshots/dashboard.png' },
    { type: 'text', text: 'Analyze this dashboard UI.' }
  ]
});

Supported Formats

Meloqui supports common image formats:

  • PNG, JPEG, WebP, GIF
  • Maximum file size: 20MB (configurable via maxImageSize)

Image Optimization

Large images consume more tokens and increase latency. Meloqui can automatically resize and compress images before sending them to the API.

Installation

Image optimization requires the optional sharp package:

bash
npm install sharp

Without sharp installed, images pass through unchanged (with a warning logged).

Per-Request Optimization

Apply optimization to specific requests:

typescript
await client.chat({
  role: 'user',
  content: [
    { type: 'image', image: './high-res-photo.jpg' },
    { type: 'text', text: 'Describe this image.' }
  ]
}, {
  imageOptimization: {
    maxWidth: 1024,    // Resize if wider than 1024px
    maxHeight: 1024,   // Resize if taller than 1024px
    quality: 80,       // JPEG/WebP quality (1-100)
    format: 'jpeg'     // Convert to JPEG
  }
});

Options

OptionTypeDefaultDescription
maxWidthnumber2048Maximum width in pixels
maxHeightnumber2048Maximum height in pixels
qualitynumber85Quality for JPEG/WebP (1-100)
formatstring(keep original)Output format: 'jpeg', 'png', 'webp'

Behavior

  • Images are resized to fit within maxWidth × maxHeight while maintaining aspect ratio
  • Small images are not upscaled (withoutEnlargement: true)
  • GIF images are not optimized (to preserve animation)
  • If sharp is not installed, images pass through unchanged

When to Use

Use optimization when:

  • Processing user-uploaded images of unknown size
  • Working with screenshots or high-resolution photos
  • Reducing API costs (fewer tokens for smaller images)

Skip optimization when:

  • Images are already appropriately sized
  • You need pixel-perfect accuracy
  • Processing animated GIFs

Checking Vision Support

Verify the provider supports vision before sending images:

typescript
if (client.capabilities.vision) {
  // Safe to send images
  await client.chat({
    role: 'user',
    content: [
      { type: 'image', image: './photo.jpg' },
      { type: 'text', text: 'What is this?' }
    ]
  });
} else {
  // Fall back to text-only
  await client.chat('Describe a sunset');
}

Error Handling

If you send an image to a provider that doesn't support vision, a CapabilityError is thrown:

typescript
import { ChatClient, CapabilityError } from 'meloqui';

try {
  await client.chat({
    role: 'user',
    content: [{ type: 'image', image: './photo.jpg' }]
  });
} catch (error) {
  if (error instanceof CapabilityError) {
    console.error('This model does not support images');
  }
}

For more error handling patterns including vision fallbacks, see the Error Handling guide.

Next Steps

Released under the MIT License.