AMA

Scenario guide

Best AI models for Customer Support Bot

A high-volume B2C chatbot. We weight Arena (human preference) and IFEval (does it follow your formatting instructions?) heavily, and lean on cost because volume is enormous. Reliability axes (format-adherence, safety-handling) are weighted modestly because a chatbot that answers off-format or false-refuses on benign requests is unusable regardless of how smart it is.

Rankings use the same scenario weights and cost blending as the interactive leaderboard on AI Model Analyzer. Data is min-max normalised per benchmark; missing scores are skipped without penalty.

  1. 1
    Qwen3 235B (Thinking)
    Alibaba (Qwen)
    Score 82.3Q 81.4In $0.20/M
  2. 2
    Gemini 3 Flash
    Google
    Score 82.0Q 92.3In $0.30/M
  3. 3
    DeepSeek V3 (Thinking)
    DeepSeek
    Score 81.2Q 84.8In $0.27/M
  4. 4
    GLM-4.6
    Zhipu AI (GLM)
    Score 80.3Q 90.1In $0.50/M
  5. 5
    DeepSeek V3
    DeepSeek
    Score 80.2Q 83.2In $0.27/M
  6. 6
    Gemini 2.0 Flash
    Google
    Score 79.4Q 69.5In $0.10/M
  7. 7
    GLM-4.7
    Zhipu AI (GLM)
    Score 78.9Q 87.8In $0.50/M
  8. 8
    DeepSeek R1
    DeepSeek
    Score 76.5Q 85.6In $0.55/M
  9. 9
    Gemini 2.5 Flash
    Google
    Score 76.2Q 82.6In $0.30/M
  10. 10
    Qwen3 235B
    Alibaba (Qwen)
    Score 75.4Q 69.9In $0.20/M
  11. 11
    Gemini 2.5 Pro
    Google
    Score 74.7Q 97.3In $1.25/M
  12. 12
    Kimi K2
    Moonshot (Kimi)
    Score 74.7Q 84.0In $0.60/M
  13. 13
    GPT-5.5
    OpenAI
    Score 74.6Q 99.4In $1.50/M
  14. 14
    Gemini 3 Pro
    Google
    Score 73.5Q 95.3In $1.25/M
  15. 15
    GPT-5 nano
    OpenAI
    Score 70.7Q 51.2In $0.05/M