AMA

Value ranking

Best value on Humanity's Last Exam

A challenging multi-disciplinary exam aggregating expert-written questions from across academic fields. Designed to discriminate at the very top of the capability range when MMLU-style tests saturate.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

  1. 1
    GPT-5 mini
    OpenAI
    152.96
    38.2 / $0.25/M
  2. 2
    Kimi K2
    Moonshot (Kimi)
    82.53
    49.5 / $0.60/M
  3. 3
    Gemini 3 Pro
    Google
    80.00
    100.0 / $1.25/M
  4. 4
    GPT-5.2
    OpenAI
    45.90
    57.4 / $1.25/M
  5. 5
    GPT-5
    OpenAI
    41.35
    51.7 / $1.25/M
  6. 6
    o4-mini
    OpenAI
    31.94
    35.1 / $1.10/M
  7. 7
    Llama 4 Maverick
    Meta
    25.07
    6.8 / $0.27/M
  8. 8
    Claude Sonnet 4.6
    Anthropic
    8.39
    25.2 / $3.00/M
  9. 9
    GPT-5.1
    OpenAI
    7.46
    9.3 / $1.25/M
  10. 10
    Claude Opus 4.7
    Anthropic
    5.11
    76.6 / $15.00/M
  11. 11
    Claude Opus 4.6
    Anthropic
    4.84
    72.5 / $15.00/M
  12. 12
    o3
    OpenAI
    4.03
    40.3 / $10.00/M
  13. 13
    Claude Sonnet 4 (Thinking)
    Anthropic
    3.84
    11.5 / $3.00/M
  14. 14
    Gemini 1.5 Pro
    Google
    3.44
    4.3 / $1.25/M
  15. 15
    Claude Opus 4.5
    Anthropic
    3.43
    51.4 / $15.00/M
  16. 16
    Claude Sonnet 4
    Anthropic
    2.13
    6.4 / $3.00/M
  17. 17
    Claude Opus 4
    Anthropic
    1.34
    20.1 / $15.00/M
  18. 18
    Claude Opus 4 (Thinking)
    Anthropic
    1.22
    18.3 / $15.00/M
  19. 19
    Claude 3.5 Sonnet
    Anthropic
    1.04
    3.1 / $3.00/M
  20. 20
    GPT-4o
    OpenAI
    0.00
    0.0 / $2.50/M

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.