AMA

Scenario guide

Best AI models for Coding Assistant

A pair-programming assistant for IDE / agent loops. Heavy on coding benchmarks, with a real-world agentic component (SWE-bench) and some weight on cost since coding loops burn tokens. A small slice goes to the recovery-rate reliability axis because real-world agent loops live or die on whether the model self-corrects.

Rankings use the same scenario weights and cost blending as the interactive leaderboard on AI Model Analyzer. Data is min-max normalised per benchmark; missing scores are skipped without penalty.

  1. 1
    GPT-5.5
    OpenAI
    Score 86.8Q 99.5In $1.50/M
  2. 2
    Gemini 3 Flash
    Google
    Score 85.7Q 90.8In $0.30/M
  3. 3
    GPT-5.4
    OpenAI
    Score 85.5Q 97.8In $1.50/M
  4. 4
    Gemini 3 Pro
    Google
    Score 83.9Q 95.0In $1.25/M
  5. 5
    DeepSeek R1
    DeepSeek
    Score 81.8Q 86.3In $0.55/M
  6. 6
    GPT-5
    OpenAI
    Score 78.9Q 88.7In $1.25/M
  7. 7
    DeepSeek V3 (Thinking)
    DeepSeek
    Score 78.4Q 79.0In $0.27/M
  8. 8
    GPT-5.2
    OpenAI
    Score 77.6Q 87.1In $1.25/M
  9. 9
    Qwen3 235B (Thinking)
    Alibaba (Qwen)
    Score 76.8Q 74.8In $0.20/M
  10. 10
    Claude Sonnet 4.5
    Anthropic
    Score 76.5Q 88.2In $3.00/M
  11. 11
    Kimi K2
    Moonshot (Kimi)
    Score 75.7Q 79.3In $0.60/M
  12. 12
    o4-mini
    OpenAI
    Score 75.5Q 81.8In $1.10/M
  13. 13
    GPT-5.1
    OpenAI
    Score 75.4Q 84.4In $1.25/M
  14. 14
    GLM-4.7
    Zhipu AI (GLM)
    Score 75.2Q 77.4In $0.50/M
  15. 15
    Claude Opus 4.6
    Anthropic
    Score 75.0Q 93.8In $15.00/M