⚡

Speed Council

The fastest models for rapid answers and high-throughput workloads. When latency matters most, this council delivers.

🏎️ Use case: Quick answers, high throughput, real-time chat

Models

Providers

340

Max tok/s

Free

Best price

📊 Council Models

Model	Provider	Speed	★	Context	Price	Notes
qwen/qwen3-32b	Groq	340 t/s	★★★★★	128K	Free	Fastest in the council
groq/compound	Groq	289 t/s	★★★½★	128K	Free	Compound reasoning, fast
llama-3.3-70b	Groq	212 t/s	★★★★★	128K	Free	Most popular fast model
nemotron-3-super	OllamaCloud	102 t/s	★★★★★	128K	Free	NVIDIA Nemotron
step-3-chat	StepFun	93 t/s	★★★★★	32K	$9/mo	StepFun flagship chat
glm-5.1	OpenCode	69 t/s	★★★★½	128K	$10/mo	GLM latest, strong quality
deepseek-v4-flash	OpenCode	58 t/s	★★★★½	128K	$10/mo	Reasoning + speed combo

⚖️ Pros & Cons

✅ Pros

Fastest time-to-first-token across all councils
Groq models are completely free with no rate limits
qwen/qwen3-32b at 340 t/s is the fastest model available
Multiple free options for zero-cost high-throughput
Great for real-time chat and streaming applications

⚠️ Cons

Faster models may sacrifice some reasoning depth
Groq doesn't support the largest parameter models
Speed-focused models may have smaller context windows
Paid options (StepFun, OpenCode) are slower than free ones

🚀 Try This Council

Send the same prompt to multiple Speed Council models simultaneously and pick the fastest/best response.

# Speed Council: Query the fastest model
curl -X POST https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-32b",
    "messages": [{"role": "user", "content": "Explain quantum computing in 2 sentences"}],
    "temperature": 0.3,
    "max_tokens": 500
  }'

🏠 Dashboard 🏆 Leaderboard 🔌 Groq on API Hub →