Scenario guide
Best AI models for Coding Assistant
A pair-programming assistant for IDE / agent loops. Heavy on coding benchmarks, with a real-world agentic component (SWE-bench) and some weight on cost since coding loops burn tokens. A small slice goes to the recovery-rate reliability axis because real-world agent loops live or die on whether the model self-corrects.
Rankings use the same scenario weights and cost blending as the interactive leaderboard on AI Model Analyzer. Data is min-max normalised per benchmark; missing scores are skipped without penalty.
- 1GPT-5.5OpenAIScore 86.8Q 99.5In $1.50/M
- 2Gemini 3 FlashGoogleScore 85.7Q 90.8In $0.30/M
- 3GPT-5.4OpenAIScore 85.5Q 97.8In $1.50/M
- 4Gemini 3 ProGoogleScore 83.9Q 95.0In $1.25/M
- 5DeepSeek R1DeepSeekScore 81.8Q 86.3In $0.55/M
- 6GPT-5OpenAIScore 78.9Q 88.7In $1.25/M
- 7DeepSeek V3 (Thinking)DeepSeekScore 78.4Q 79.0In $0.27/M
- 8GPT-5.2OpenAIScore 77.6Q 87.1In $1.25/M
- 9Qwen3 235B (Thinking)Alibaba (Qwen)Score 76.8Q 74.8In $0.20/M
- 10Claude Sonnet 4.5AnthropicScore 76.5Q 88.2In $3.00/M
- 11Kimi K2Moonshot (Kimi)Score 75.7Q 79.3In $0.60/M
- 12o4-miniOpenAIScore 75.5Q 81.8In $1.10/M
- 13GPT-5.1OpenAIScore 75.4Q 84.4In $1.25/M
- 14GLM-4.7Zhipu AI (GLM)Score 75.2Q 77.4In $0.50/M
- 15Claude Opus 4.6AnthropicScore 75.0Q 93.8In $15.00/M