Speed Council
The fastest models for rapid answers and high-throughput workloads. When latency matters most, this council delivers.
๐๏ธ Use case: Quick answers, high throughput, real-time chat
7
Models
5
Providers
340
Max tok/s
Free
Best price
๐ Council Models
| Model | Provider | Speed | โ | Context | Price | Notes |
|---|---|---|---|---|---|---|
| qwen/qwen3-32b | Groq | 340 t/s | โ โ โ โ โ | 128K | Free | Fastest in the council |
| groq/compound | Groq | 289 t/s | โ โ โ ยฝโ | 128K | Free | Compound reasoning, fast |
| llama-3.3-70b | Groq | 212 t/s | โ โ โ โ โ | 128K | Free | Most popular fast model |
| nemotron-3-super | OllamaCloud | 102 t/s | โ โ โ โ โ | 128K | Free | NVIDIA Nemotron |
| step-3-chat | StepFun | 93 t/s | โ โ โ โ โ | 32K | $9/mo | StepFun flagship chat |
| glm-5.1 | OpenCode | 69 t/s | โ โ โ โ ยฝ | 128K | $10/mo | GLM latest, strong quality |
| deepseek-v4-flash | OpenCode | 58 t/s | โ โ โ โ ยฝ | 128K | $10/mo | Reasoning + speed combo |
โ๏ธ Pros & Cons
โ Pros
- Fastest time-to-first-token across all councils
- Groq models are completely free with no rate limits
- qwen/qwen3-32b at 340 t/s is the fastest model available
- Multiple free options for zero-cost high-throughput
- Great for real-time chat and streaming applications
โ ๏ธ Cons
- Faster models may sacrifice some reasoning depth
- Groq doesn't support the largest parameter models
- Speed-focused models may have smaller context windows
- Paid options (StepFun, OpenCode) are slower than free ones
๐ Try This Council
Send the same prompt to multiple Speed Council models simultaneously and pick the fastest/best response.
# Speed Council: Query the fastest model
curl -X POST https://api.groq.com/openai/v1/chat/completions \
-H "Authorization: Bearer $GROQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/qwen3-32b",
"messages": [{"role": "user", "content": "Explain quantum computing in 2 sentences"}],
"temperature": 0.3,
"max_tokens": 500
}'