โšก

Speed Council

The fastest models for rapid answers and high-throughput workloads. When latency matters most, this council delivers.
๐ŸŽ๏ธ Use case: Quick answers, high throughput, real-time chat
7
Models
5
Providers
340
Max tok/s
Free
Best price

๐Ÿ“Š Council Models

Model Provider Speed โ˜… Context Price Notes
qwen/qwen3-32b Groq 340 t/s โ˜…โ˜…โ˜…โ˜…โ˜… 128K Free Fastest in the council
groq/compound Groq 289 t/s โ˜…โ˜…โ˜…ยฝโ˜… 128K Free Compound reasoning, fast
llama-3.3-70b Groq 212 t/s โ˜…โ˜…โ˜…โ˜…โ˜… 128K Free Most popular fast model
nemotron-3-super OllamaCloud 102 t/s โ˜…โ˜…โ˜…โ˜…โ˜… 128K Free NVIDIA Nemotron
step-3-chat StepFun 93 t/s โ˜…โ˜…โ˜…โ˜…โ˜… 32K StepFun flagship chat
glm-5.1 OpenCode 69 t/s โ˜…โ˜…โ˜…โ˜…ยฝ 128K GLM latest, strong quality
deepseek-v4-flash OpenCode 58 t/s โ˜…โ˜…โ˜…โ˜…ยฝ 128K Reasoning + speed combo

โš–๏ธ Pros & Cons

โœ… Pros

  • Fastest time-to-first-token across all councils
  • Groq models are completely free with no rate limits
  • qwen/qwen3-32b at 340 t/s is the fastest model available
  • Multiple free options for zero-cost high-throughput
  • Great for real-time chat and streaming applications

โš ๏ธ Cons

  • Faster models may sacrifice some reasoning depth
  • Groq doesn't support the largest parameter models
  • Speed-focused models may have smaller context windows
  • Paid options (StepFun, OpenCode) are slower than free ones

๐Ÿš€ Try This Council

Send the same prompt to multiple Speed Council models simultaneously and pick the fastest/best response.

# Speed Council: Query the fastest model
curl -X POST https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-32b",
    "messages": [{"role": "user", "content": "Explain quantum computing in 2 sentences"}],
    "temperature": 0.3,
    "max_tokens": 500
  }'