Value ranking

Best value on ARC-AGI 2

Second-generation ARC challenge testing fluid reasoning over abstract visual puzzles. Resists training-data memorisation by construction: each puzzle is novel and solutions require multi-step pattern induction. Frontier models are only just starting to score above chance on the harder tier.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

1
Gemini 3 Flash
Google
131.80
39.5 / $0.30/M
2
Gemini 3 Pro
Google
72.57
90.7 / $1.25/M
3
GPT-5.5
OpenAI
66.67
100.0 / $1.50/M
4
GPT-5 nano
OpenAI
61.40
3.1 / $0.05/M
5
GPT-5.4
OpenAI
58.00
87.0 / $1.50/M
6
GPT-5.2
OpenAI
49.80
62.3 / $1.25/M
7
Claude Sonnet 4.6
Anthropic
23.69
71.1 / $3.00/M
8
Kimi K2
Moonshot (Kimi)
23.15
13.9 / $0.60/M
9
GPT-5 mini
OpenAI
20.88
5.2 / $0.25/M
10
DeepSeek V3
DeepSeek
17.56
4.7 / $0.27/M
11
GPT-5.1
OpenAI
16.60
20.8 / $1.25/M
12
Gemini 2.0 Flash
Google
15.30
1.5 / $0.10/M
13
GPT-5
OpenAI
9.28
11.6 / $1.25/M
14
o4-mini
OpenAI
6.54
7.2 / $1.10/M
15
Claude Opus 4.7
Anthropic
5.95
89.2 / $15.00/M
16
Claude Opus 4.6
Anthropic
5.43
81.4 / $15.00/M
17
Claude Sonnet 4.5
Anthropic
5.34
16.0 / $3.00/M
18
Claude Haiku 4.5
Anthropic
4.74
4.7 / $1.00/M
19
Grok 4
xAI
3.76
18.8 / $5.00/M
20
o3-mini
OpenAI
3.20
3.5 / $1.10/M
21
Claude Opus 4.5
Anthropic
2.95
44.3 / $15.00/M
22
DeepSeek R1
DeepSeek
2.78
1.5 / $0.55/M
23
o3
OpenAI
0.77
7.7 / $10.00/M
24
Gemini 1.5 Pro
Google
0.75
0.9 / $1.25/M
25
Claude Opus 4
Anthropic
0.68
10.1 / $15.00/M

Open full leaderboard

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.