Scenario guide

Best AI models for Coding Assistant

A pair-programming assistant for IDE / agent loops. Heavy on coding benchmarks, with a real-world agentic component (SWE-bench) and some weight on cost since coding loops burn tokens. A small slice goes to the recovery-rate reliability axis because real-world agent loops live or die on whether the model self-corrects.

Rankings use the same scenario weights and cost blending as the interactive leaderboard on AI Model Analyzer. Data is min-max normalised per benchmark; missing scores are skipped without penalty.

1
Gemini 3 Pro
Google
Score 85.9Q 97.5In $1.25/M
2
Gemini 3 Flash
Google
Score 85.8Q 90.9In $0.30/M
3
GPT-5.5
OpenAI
Score 84.4Q 96.4In $1.50/M
4
GPT-5.4
OpenAI
Score 82.7Q 94.3In $1.50/M
5
DeepSeek R1
DeepSeek
Score 81.6Q 86.2In $0.55/M
6
GPT-5
OpenAI
Score 78.0Q 87.6In $1.25/M
7
GPT-5.2
OpenAI
Score 77.6Q 87.1In $1.25/M
8
Claude Opus 4.6
Anthropic
Score 77.0Q 96.2In $15.00/M
9
DeepSeek V3 (Thinking)
DeepSeek
Score 76.8Q 77.0In $0.27/M
10
Claude Sonnet 4.6
Anthropic
Score 76.6Q 88.3In $3.00/M
11
Claude Opus 4.7
Anthropic
Score 76.3Q 95.4In $15.00/M
12
Kimi K2
Moonshot (Kimi)
Score 75.7Q 79.3In $0.60/M
13
Claude Sonnet 4.5
Anthropic
Score 75.1Q 86.5In $3.00/M
14
o4-mini
OpenAI
Score 75.1Q 81.2In $1.10/M
15
Qwen3 235B (Thinking)
Alibaba (Qwen)
Score 74.3Q 71.7In $0.20/M

Open interactive leaderboard Build custom weights Home