Value ranking

Best value on Frontier Composite

Saturation-resistant composite capability score stitched together from ~40 underlying benchmarks using Item Response Theory. Each benchmark is weighted by its fitted difficulty and discriminative slope, so doing well on hard, contamination-resistant evals (FrontierMath, ARC-AGI 2, Humanity's Last Exam) moves the score and saturated benchmarks contribute almost nothing. Imported per-model from Epoch AI's published index; we anchor it to the same min-max scale we use for every other benchmark so it's directly weightable in scenarios.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

1
GPT-5 nano
OpenAI
1052.80
52.6 / $0.05/M
2
Gemini 2.0 Flash
Google
398.00
39.8 / $0.10/M
3
Gemini 1.5 Flash
Google
333.20
25.0 / $0.08/M
4
Qwen3 235B (Thinking)
Alibaba (Qwen)
331.65
66.3 / $0.20/M
5
Gemini 3 Flash
Google
269.93
81.0 / $0.30/M
6
GPT-5 mini
OpenAI
265.28
66.3 / $0.25/M
7
DeepSeek V3
DeepSeek
255.44
69.0 / $0.27/M
8
Qwen3 235B
Alibaba (Qwen)
251.50
50.3 / $0.20/M
9
DeepSeek V3 (Thinking)
DeepSeek
241.22
65.1 / $0.27/M
10
Gemini 2.5 Flash
Google
201.10
60.3 / $0.30/M
11
Llama 4 Scout
Meta
141.17
25.4 / $0.18/M
12
GLM-4.7
Zhipu AI (GLM)
127.26
63.6 / $0.50/M
13
DeepSeek R1
DeepSeek
125.40
69.0 / $0.55/M
14
Kimi K2
Moonshot (Kimi)
122.70
73.6 / $0.60/M
15
Llama 4 Maverick
Meta
119.52
32.3 / $0.27/M
16
GLM-4.6
Zhipu AI (GLM)
111.60
55.8 / $0.50/M
17
GPT-4o mini
OpenAI
105.20
15.8 / $0.15/M
18
Gemini 3 Pro
Google
77.80
97.3 / $1.25/M
19
GPT-5.2
OpenAI
70.94
88.7 / $1.25/M
20
GPT-5.5
OpenAI
66.67
100.0 / $1.50/M
21
GPT-5.4
OpenAI
63.58
95.4 / $1.50/M
22
o4-mini
OpenAI
63.55
69.9 / $1.10/M
23
GPT-5
OpenAI
62.76
78.5 / $1.25/M
24
GPT-5.1
OpenAI
62.10
77.6 / $1.25/M
25
Claude Haiku 4.5
Anthropic
59.70
59.7 / $1.00/M

Open full leaderboard

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.