Skip to main content

Overview

LibreChat supports multiple image generation engines that can be used by AI agents to create images based on text descriptions. Agents can generate images as part of conversations or workflows.

Supported Engines

OpenAI’s DALL-E 2 and DALL-E 3 models.
# .env
DALLE_API_KEY=your-openai-key
DALLE3_API_KEY=your-openai-key  # Can be separate
# librechat.yaml (optional customization)
# DALLE3_SYSTEM_PROMPT="Custom system prompt"
# DALLE3_BASEURL="https://api.openai.com/v1"

Configuration

1

Choose Image Provider

Select one or more image generation engines based on your needs:
  • DALL-E: Best integration, commercial use allowed
  • Flux: High quality, flexible licensing
  • Stable Diffusion: Self-hosted, fully customizable
  • Gemini: Multimodal integration with Google models
2

Configure API Keys

Add the necessary API keys to your .env file:
# .env - Choose the engines you want
DALLE3_API_KEY=your-openai-key
FLUX_API_KEY=your-flux-key
SD_WEBUI_URL=http://localhost:7860
GEMINI_API_KEY=your-gemini-key
3

Enable for Agents

Image generation is automatically available to agents when configured. No additional settings needed in librechat.yaml.

Using Image Generation

In Conversations

Generate an image of a serene mountain landscape at sunset
with a lake in the foreground.

Image Generation Tool

When an agent has image generation enabled, it uses the tool automatically:
// Agent automatically calls image generation tool
{
  tool: "generate_image",
  parameters: {
    prompt: "A serene mountain landscape at sunset",
    size: "1024x1024",
    quality: "hd"
  }
}

Generation Progress

During image generation, users see a progress indicator:
// Progress tracking component
<ImageGen
  initialProgress={0.1}
  args="Creating Image..."
/>
The progress animation shows:
  • Visual spinner
  • Status text (“Creating Image”, “Finished”)
  • Progress percentage

File Configuration

Control image generation settings:
# librechat.yaml
fileConfig:
  imageGeneration:
    percentage: 100  # Scale to percentage of original
    # OR
    px: 1024  # Fixed pixel size

Advanced Options

DALL-E Azure Integration

Use DALL-E through Azure OpenAI:
# .env
DALLE3_AZURE_API_VERSION=2024-02-15-preview
DALLE2_AZURE_API_VERSION=2024-02-15-preview

Custom DALL-E Endpoints

# .env
DALLE_REVERSE_PROXY=https://your-proxy.com
DALLE3_BASEURL=https://custom-dalle.com
DALLE2_BASEURL=https://custom-dalle.com

Stable Diffusion Configuration

For self-hosted Stable Diffusion WebUI:
  1. Launch Automatic1111 WebUI with API enabled:
    ./webui.sh --api --listen
    
  2. Configure the URL:
    SD_WEBUI_URL=http://localhost:7860
    

Gemini Authentication

Multiple authentication methods:
GEMINI_API_KEY=your-key
Default, easiest for development.
GOOGLE_KEY=your-key
Shared with Google chat endpoint.
GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json
GOOGLE_LOC=us-central1
For production deployments.

Image Specifications

DALL-E

  • DALL-E 2: 256x256, 512x512, 1024x1024
  • DALL-E 3: 1024x1024, 1792x1024, 1024x1792
  • Formats: PNG
  • Quality: Standard or HD (DALL-E 3)

Flux

  • Resolutions: Up to 2048x2048
  • Formats: PNG, JPEG
  • Models: Flux Pro, Flux Dev, Flux Schnell

Stable Diffusion

  • Fully customizable based on your model
  • Common: 512x512, 768x768, 1024x1024
  • Supports various samplers and schedulers

Gemini

  • Model: gemini-2.5-flash-image
  • Integrated with Gemini multimodal capabilities
  • Supports various aspect ratios

Troubleshooting

  • Verify API keys are correct and active
  • Check API quota/billing status
  • Ensure network connectivity to image service
  • Review error messages in logs
  • Verify WebUI is running with --api flag
  • Check firewall settings
  • Use host.docker.internal for Docker setups
  • Confirm URL includes port number
  • DALL-E 3 has content policy restrictions
  • Avoid restricted content (violence, adult, etc.)
  • Rephrase prompts if rejected
  • Check OpenAI status page for outages
  • Check file storage configuration
  • Verify browser can access image URLs
  • Check file size limits in fileConfig
  • Clear browser cache

Best Practices

  • Detailed prompts: More specific descriptions produce better results
  • Style keywords: Include art style, lighting, perspective
  • Iterate: Refine prompts based on results
  • Multiple engines: Different engines excel at different styles
  • Cost awareness: Monitor API usage, especially for DALL-E 3

Example Prompts

Good: "A photorealistic portrait of a golden retriever puppy
playing in autumn leaves, soft natural lighting, shallow depth
of field, professional photography"

Poor: "dog picture"