Value ranking

Best value on Recovery Rate

How often the model self-corrects after producing an incorrect intermediate step (debugging axis upstream). Critical for agentic loops that depend on the model noticing and repairing its own mistakes rather than barrelling forward.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

1
DeepSeek V3
DeepSeek
369.70
99.8 / $0.27/M
2
Gemini 2.5 Flash
Google
333.33
100.0 / $0.30/M
3
DeepSeek R1
DeepSeek
181.49
99.8 / $0.55/M
4
Kimi K2
Moonshot (Kimi)
166.37
99.8 / $0.60/M
5
Gemini 3 Pro
Google
80.00
100.0 / $1.25/M
6
GPT-5.2
OpenAI
79.86
99.8 / $1.25/M
7
GPT-5.4
OpenAI
66.55
99.8 / $1.50/M
8
GPT-5.5
OpenAI
66.55
99.8 / $1.50/M
9
Claude Sonnet 4
Anthropic
33.33
100.0 / $3.00/M
10
Claude Sonnet 4.5
Anthropic
33.33
100.0 / $3.00/M
11
Claude Sonnet 4.6
Anthropic
33.33
100.0 / $3.00/M
12
Grok 4
xAI
20.00
100.0 / $5.00/M
13
Claude Opus 4.6
Anthropic
6.67
100.0 / $15.00/M
14
Claude Opus 4.7
Anthropic
6.67
100.0 / $15.00/M
15
Claude Opus 4
Anthropic
6.45
96.7 / $15.00/M
16
Claude Opus 4.5
Anthropic
6.34
95.1 / $15.00/M
17
GLM-4.6
Zhipu AI (GLM)
0.00
0.0 / $0.50/M
18
GLM-4.7
Zhipu AI (GLM)
0.00
0.0 / $0.50/M

Open full leaderboard

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.