๐Ÿ‘๏ธ

Vision Council

Multi-modal models that understand images, screenshots, and visual content. OCR, diagram analysis, and visual Q&A.
๐Ÿ“ท Use case: Screenshot review, OCR, visual analysis
7
Models
6
Providers
225
Max tok/s
Free
Best price

๐Ÿ“Š Council Models

ModelProviderSpeedโ˜…ContextPriceNotes
llama-4-scout-17bGroq225 t/sโ˜…โ˜…โ˜…ยฝโ˜…128KFreeFastest vision model
step-3-vlStepFun93 t/sโ˜…โ˜…โ˜…โ˜…โ˜…32KStepFun vision-language model
claude-sonnet-4-5Venice35 t/sโ˜…โ˜…โ˜…โ˜…ยฝ200KUncensored, best vision quality
google/gemini-2.5-pro:freeOpenRouter46 t/sโ˜…โ˜…โ˜…โ˜…โ˜…1MFree1M context, free tier
google/gemini-2.5-proZenMux37 t/sโ˜…โ˜…โ˜…โ˜…โ˜…1MFree1M context, 700 req/day
gemma-4-31b-turbo-teeChutes TEE28 t/sโ˜…โ˜…โ˜…ยฝโ˜…128KTEE-secured vision model
kimi-k2.5-teeChutes TEE28 t/sโ˜…โ˜…โ˜…โ˜…โ˜…128KVision + video understanding

โš–๏ธ Pros & Cons

โœ… Pros

  • Gemini 2.5 Pro has 1M context โ€” analyze entire documents
  • Free options available (Groq llama-4-scout, OpenRouter, ZenMux)
  • Claude Sonnet 4.5 via Venice is uncensored for all content
  • TEE models for privacy-sensitive image analysis
  • kimi-k2.5 supports video frame understanding

โš ๏ธ Cons

  • Vision models are generally slower than text-only
  • Image quality varies significantly across providers
  • TEE vision models are the slowest (28 t/s)
  • StepFun's context window is limited (32K)

๐Ÿš€ Try This Council

Send an image for analysis to multiple Vision Council models.

# Vision Council: Analyze an image with Gemini 2.5 Pro
curl -X POST https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.5-pro:free",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe what you see in this image in detail."},
        {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
      ]
    }],
    "max_tokens": 1000
  }'