Value ranking
Best value on Humanity's Last Exam
A challenging multi-disciplinary exam aggregating expert-written questions from across academic fields. Designed to discriminate at the very top of the capability range when MMLU-style tests saturate.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1GPT-5 miniOpenAI152.9638.2 / $0.25/M
- 2Kimi K2Moonshot (Kimi)82.5349.5 / $0.60/M
- 3Gemini 3 ProGoogle80.00100.0 / $1.25/M
- 4GPT-5.2OpenAI45.9057.4 / $1.25/M
- 5GPT-5OpenAI41.3551.7 / $1.25/M
- 6o4-miniOpenAI31.9435.1 / $1.10/M
- 7Llama 4 MaverickMeta25.076.8 / $0.27/M
- 8Claude Sonnet 4.6Anthropic8.3925.2 / $3.00/M
- 9GPT-5.1OpenAI7.469.3 / $1.25/M
- 10Claude Opus 4.7Anthropic5.1176.6 / $15.00/M
- 11Claude Opus 4.6Anthropic4.8472.5 / $15.00/M
- 12o3OpenAI4.0340.3 / $10.00/M
- 13Claude Sonnet 4 (Thinking)Anthropic3.8411.5 / $3.00/M
- 14Gemini 1.5 ProGoogle3.444.3 / $1.25/M
- 15Claude Opus 4.5Anthropic3.4351.4 / $15.00/M
- 16Claude Sonnet 4Anthropic2.136.4 / $3.00/M
- 17Claude Opus 4Anthropic1.3420.1 / $15.00/M
- 18Claude Opus 4 (Thinking)Anthropic1.2218.3 / $15.00/M
- 19Claude 3.5 SonnetAnthropic1.043.1 / $3.00/M
- 20GPT-4oOpenAI0.000.0 / $2.50/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.