Scenario guide

Best AI models for Customer Support Bot

A high-volume B2C chatbot. We weight Arena (human preference) and IFEval (does it follow your formatting instructions?) heavily, and lean on cost because volume is enormous. Reliability axes (format-adherence, safety-handling) are weighted modestly because a chatbot that answers off-format or false-refuses on benign requests is unusable regardless of how smart it is.

Rankings use the same scenario weights and cost blending as the interactive leaderboard on AI Model Analyzer. Data is min-max normalised per benchmark; missing scores are skipped without penalty.

1
Qwen3 235B (Thinking)
Alibaba (Qwen)
Score 82.3Q 81.4In $0.20/M
2
Gemini 3 Flash
Google
Score 82.0Q 92.3In $0.30/M
3
DeepSeek V3 (Thinking)
DeepSeek
Score 81.2Q 84.8In $0.27/M
4
GLM-4.6
Zhipu AI (GLM)
Score 80.3Q 90.1In $0.50/M
5
DeepSeek V3
DeepSeek
Score 80.2Q 83.2In $0.27/M
6
Gemini 2.0 Flash
Google
Score 79.4Q 69.5In $0.10/M
7
GLM-4.7
Zhipu AI (GLM)
Score 78.9Q 87.8In $0.50/M
8
DeepSeek R1
DeepSeek
Score 76.5Q 85.6In $0.55/M
9
Gemini 2.5 Flash
Google
Score 76.2Q 82.6In $0.30/M
10
Qwen3 235B
Alibaba (Qwen)
Score 75.4Q 69.9In $0.20/M
11
Gemini 2.5 Pro
Google
Score 74.7Q 97.3In $1.25/M
12
Kimi K2
Moonshot (Kimi)
Score 74.7Q 84.0In $0.60/M
13
GPT-5.5
OpenAI
Score 74.6Q 99.4In $1.50/M
14
Gemini 3 Pro
Google
Score 73.5Q 95.3In $1.25/M
15
GPT-5 nano
OpenAI
Score 70.7Q 51.2In $0.05/M

Open interactive leaderboard Build custom weights Home