Vision & Multimodal
Meloqui supports multimodal models (like GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro) that can understand images.
Sending Images
Pass an array of content parts instead of a simple string:
await client.chat({
role: 'user',
content: [
{ type: 'text', text: 'What is in this image?' },
{ type: 'image', image: 'https://example.com/photo.jpg' }
]
});Local Files
Local file paths are automatically base64 encoded:
await client.chat({
role: 'user',
content: [
{ type: 'image', image: './screenshots/dashboard.png' },
{ type: 'text', text: 'Analyze this dashboard UI.' }
]
});Supported Formats
Meloqui supports common image formats:
- PNG, JPEG, WebP, GIF
- Maximum file size: 20MB (configurable via
maxImageSize)
Image Optimization
Large images consume more tokens and increase latency. Meloqui can automatically resize and compress images before sending them to the API.
Installation
Image optimization requires the optional sharp package:
npm install sharpWithout sharp installed, images pass through unchanged (with a warning logged).
Per-Request Optimization
Apply optimization to specific requests:
await client.chat({
role: 'user',
content: [
{ type: 'image', image: './high-res-photo.jpg' },
{ type: 'text', text: 'Describe this image.' }
]
}, {
imageOptimization: {
maxWidth: 1024, // Resize if wider than 1024px
maxHeight: 1024, // Resize if taller than 1024px
quality: 80, // JPEG/WebP quality (1-100)
format: 'jpeg' // Convert to JPEG
}
});Options
| Option | Type | Default | Description |
|---|---|---|---|
maxWidth | number | 2048 | Maximum width in pixels |
maxHeight | number | 2048 | Maximum height in pixels |
quality | number | 85 | Quality for JPEG/WebP (1-100) |
format | string | (keep original) | Output format: 'jpeg', 'png', 'webp' |
Behavior
- Images are resized to fit within
maxWidth×maxHeightwhile maintaining aspect ratio - Small images are not upscaled (
withoutEnlargement: true) - GIF images are not optimized (to preserve animation)
- If sharp is not installed, images pass through unchanged
When to Use
Use optimization when:
- Processing user-uploaded images of unknown size
- Working with screenshots or high-resolution photos
- Reducing API costs (fewer tokens for smaller images)
Skip optimization when:
- Images are already appropriately sized
- You need pixel-perfect accuracy
- Processing animated GIFs
Checking Vision Support
Verify the provider supports vision before sending images:
if (client.capabilities.vision) {
// Safe to send images
await client.chat({
role: 'user',
content: [
{ type: 'image', image: './photo.jpg' },
{ type: 'text', text: 'What is this?' }
]
});
} else {
// Fall back to text-only
await client.chat('Describe a sunset');
}Error Handling
If you send an image to a provider that doesn't support vision, a CapabilityError is thrown:
import { ChatClient, CapabilityError } from 'meloqui';
try {
await client.chat({
role: 'user',
content: [{ type: 'image', image: './photo.jpg' }]
});
} catch (error) {
if (error instanceof CapabilityError) {
console.error('This model does not support images');
}
}For more error handling patterns including vision fallbacks, see the Error Handling guide.
Next Steps
- Error Handling - Comprehensive error handling patterns
- API Reference: Images - Image utility functions
