Scenario guide
Best AI models for Coding Assistant
A pair-programming assistant for IDE / agent loops. Heavy on coding benchmarks, with a real-world agentic component (SWE-bench) and some weight on cost since coding loops burn tokens. A small slice goes to the recovery-rate reliability axis because real-world agent loops live or die on whether the model self-corrects.
Rankings use the same scenario weights and cost blending as the interactive leaderboard on AI Model Analyzer. Data is min-max normalised per benchmark; missing scores are skipped without penalty.
- 1Gemini 3 ProGoogleScore 85.9Q 97.5In $1.25/M
- 2Gemini 3 FlashGoogleScore 85.8Q 90.9In $0.30/M
- 3GPT-5.5OpenAIScore 84.4Q 96.4In $1.50/M
- 4GPT-5.4OpenAIScore 82.7Q 94.3In $1.50/M
- 5DeepSeek R1DeepSeekScore 81.6Q 86.2In $0.55/M
- 6GPT-5OpenAIScore 78.0Q 87.6In $1.25/M
- 7GPT-5.2OpenAIScore 77.6Q 87.1In $1.25/M
- 8Claude Opus 4.6AnthropicScore 77.0Q 96.2In $15.00/M
- 9DeepSeek V3 (Thinking)DeepSeekScore 76.8Q 77.0In $0.27/M
- 10Claude Sonnet 4.6AnthropicScore 76.6Q 88.3In $3.00/M
- 11Claude Opus 4.7AnthropicScore 76.3Q 95.4In $15.00/M
- 12Kimi K2Moonshot (Kimi)Score 75.7Q 79.3In $0.60/M
- 13Claude Sonnet 4.5AnthropicScore 75.1Q 86.5In $3.00/M
- 14o4-miniOpenAIScore 75.1Q 81.2In $1.10/M
- 15Qwen3 235B (Thinking)Alibaba (Qwen)Score 74.3Q 71.7In $0.20/M