Scenario guide

Best AI models for Realtime Chat / Voice

A streaming consumer chat or voice assistant. Speed and time-to-first-token matter as much as raw quality — a slightly less smart model that responds instantly often beats a frontier model that pauses to think.

Rankings use the same scenario weights and cost blending as the interactive leaderboard on AI Model Analyzer. Data is min-max normalised per benchmark; missing scores are skipped without penalty.

1
Gemini 2.0 Flash
Google
Score 91.3Q 89.9In $0.10/M
2
DeepSeek V3
DeepSeek
Score 89.9Q 95.8In $0.27/M
3
Gemini 1.5 Flash
Google
Score 88.9Q 84.1In $0.08/M
4
DeepSeek R1
DeepSeek
Score 86.4Q 96.3In $0.55/M
5
Gemini 3 Flash
Google
Score 84.5Q 92.3In $0.30/M
6
Gemini 2.5 Flash
Google
Score 83.0Q 90.1In $0.30/M
7
GLM-4.6
Zhipu AI (GLM)
Score 83.0Q 90.1In $0.50/M
8
Qwen3 235B (Thinking)
Alibaba (Qwen)
Score 82.3Q 81.4In $0.20/M
9
DeepSeek V3 (Thinking)
DeepSeek
Score 82.2Q 84.8In $0.27/M
10
GLM-4.7
Zhipu AI (GLM)
Score 81.3Q 87.8In $0.50/M
11
GPT-5.4
OpenAI
Score 81.2Q 100.0In $1.50/M
12
GPT-5.5
OpenAI
Score 80.4Q 98.9In $1.50/M
13
Kimi K2
Moonshot (Kimi)
Score 77.1Q 84.0In $0.60/M
14
Gemini 3 Pro
Google
Score 76.5Q 91.9In $1.25/M
15
GPT-5.1
OpenAI
Score 75.4Q 90.4In $1.25/M

Open interactive leaderboard Build custom weights Home