AMA

Scenario guide

Best AI models for Customer Support Bot

A high-volume B2C chatbot. We weight Arena (human preference) and IFEval (does it follow your formatting instructions?) heavily, and lean on cost because volume is enormous. Reliability axes (format-adherence, safety-handling) are weighted modestly because a chatbot that answers off-format or false-refuses on benign requests is unusable regardless of how smart it is.

Rankings use the same scenario weights and cost blending as the interactive leaderboard on AI Model Analyzer. Data is min-max normalised per benchmark; missing scores are skipped without penalty.

  1. 1
    DeepSeek V3
    DeepSeek
    Score 81.9Q 86.1In $0.27/M
  2. 2
    Gemini 3 Flash
    Google
    Score 81.3Q 91.1In $0.30/M
  3. 3
    Qwen3 235B (Thinking)
    Alibaba (Qwen)
    Score 78.6Q 75.2In $0.20/M
  4. 4
    Gemini 2.5 Flash
    Google
    Score 78.4Q 86.3In $0.30/M
  5. 5
    DeepSeek R1
    DeepSeek
    Score 77.6Q 87.4In $0.55/M
  6. 6
    DeepSeek V3 (Thinking)
    DeepSeek
    Score 77.2Q 78.2In $0.27/M
  7. 7
    Gemini 2.0 Flash
    Google
    Score 77.1Q 65.8In $0.10/M
  8. 8
    Kimi K2
    Moonshot (Kimi)
    Score 75.7Q 85.7In $0.60/M
  9. 9
    Gemini 3 Pro
    Google
    Score 75.4Q 98.4In $1.25/M
  10. 10
    Qwen3 235B
    Alibaba (Qwen)
    Score 73.0Q 65.9In $0.20/M
  11. 11
    GPT-5.5
    OpenAI
    Score 72.5Q 95.9In $1.50/M
  12. 12
    GPT-5.4
    OpenAI
    Score 72.1Q 95.2In $1.50/M
  13. 13
    Gemini 2.5 Pro
    Google
    Score 71.2Q 91.5In $1.25/M
  14. 14
    GPT-5 nano
    OpenAI
    Score 68.3Q 47.2In $0.05/M
  15. 15
    Gemini 1.5 Flash
    Google
    Score 68.3Q 47.6In $0.08/M