Overview
Multimodal chat enables AI models to understand and process multiple types of content beyond text, including images, documents, PDFs, and other files. This allows for visual question answering, document analysis, and rich interactive conversations.Supported Content Types
- Images
- Documents
- Code & Data
- Other Files
Vision-enabled models can analyze images:
- Formats: PNG, JPG, JPEG, GIF, WebP
- Use cases:
- Describe images
- Extract text (OCR)
- Answer questions about visual content
- Analyze charts and diagrams
- Compare multiple images
Vision-Enabled Models
Not all models support multimodal input. Vision-capable models include:OpenAI
OpenAI
- GPT-4o
- GPT-4o-mini
- GPT-4 Turbo with Vision
- GPT-4V
Anthropic
Anthropic
- Claude Sonnet 4
- Claude Opus 4
- Claude 3.7 Sonnet
- Claude 3.5 Sonnet
- Claude 3 Opus
Google
- Gemini 3.1 Pro
- Gemini 2.5 Pro
- Gemini 2.5 Flash
- Gemini 2.0 Flash
- All Gemini models support vision
Uploading Files
Select Files
Choose one or more files from your device.
Most models support multiple file uploads in a single message.
Wait for Upload
Files are uploaded and processed before sending. You’ll see:
- Upload progress indicator
- Thumbnail previews for images
- File name and size for documents
Add Context (Optional)
Type a message to provide context or ask specific questions about the uploaded files.
Example Use Cases
- Image Analysis
- Document Summarization
- Data Analysis
- Code Review
- Visual Comparison
File Configuration
Configure file upload limits and restrictions:Client-Side Image Resizing
Automatically resize large images before upload:Rate Limiting
Control file upload frequency:Image Vision in Agents
Enable vision capabilities for agents:File Storage
Configure where uploaded files are stored:- Local Storage
- AWS S3
- Firebase
- Granular Strategy
Best Practices
Limitations
Troubleshooting
Upload fails
Upload fails
- Check file size against limits
- Verify file type is supported
- Ensure sufficient storage space
- Check network connectivity
Model can't see images
Model can't see images
- Verify model supports vision (GPT-4o, Claude Sonnet, Gemini)
- Check image format is supported
- Try re-uploading the image
- Ensure image isn’t corrupted
Poor image analysis
Poor image analysis
- Use higher quality images
- Ensure images are well-lit and clear
- Crop to relevant areas
- Try different prompting
File storage errors
File storage errors
- Check storage configuration in
librechat.yaml - Verify S3/Firebase credentials if using cloud storage
- Ensure server has disk space for local storage
- Check file permissions