Value ranking

Best value on Humanity's Last Exam

A challenging multi-disciplinary exam aggregating expert-written questions from across academic fields. Designed to discriminate at the very top of the capability range when MMLU-style tests saturate.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

1
GPT-5 mini
OpenAI
152.96
38.2 / $0.25/M
2
Kimi K2
Moonshot (Kimi)
82.53
49.5 / $0.60/M
3
Gemini 3 Pro
Google
80.00
100.0 / $1.25/M
4
GPT-5.2
OpenAI
45.90
57.4 / $1.25/M
5
GPT-5
OpenAI
41.35
51.7 / $1.25/M
6
o4-mini
OpenAI
31.94
35.1 / $1.10/M
7
Llama 4 Maverick
Meta
25.07
6.8 / $0.27/M
8
Claude Sonnet 4.6
Anthropic
8.39
25.2 / $3.00/M
9
GPT-5.1
OpenAI
7.46
9.3 / $1.25/M
10
Claude Opus 4.7
Anthropic
5.11
76.6 / $15.00/M
11
Claude Opus 4.6
Anthropic
4.84
72.5 / $15.00/M
12
o3
OpenAI
4.03
40.3 / $10.00/M
13
Claude Sonnet 4 (Thinking)
Anthropic
3.84
11.5 / $3.00/M
14
Gemini 1.5 Pro
Google
3.44
4.3 / $1.25/M
15
Claude Opus 4.5
Anthropic
3.43
51.4 / $15.00/M
16
Claude Sonnet 4
Anthropic
2.13
6.4 / $3.00/M
17
Claude Opus 4
Anthropic
1.34
20.1 / $15.00/M
18
Claude Opus 4 (Thinking)
Anthropic
1.22
18.3 / $15.00/M
19
Claude 3.5 Sonnet
Anthropic
1.04
3.1 / $3.00/M
20
GPT-4o
OpenAI
0.00
0.0 / $2.50/M

Open full leaderboard

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.