AMA

Value ranking

Best value on HumanEval

164 hand-written Python programming problems scored by passing unit tests. Saturated for frontier models.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

  1. 1
    Gemini 2.0 Flash
    Google
    680.20
    68.0 / $0.10/M
  2. 2
    Qwen3 235B
    Alibaba (Qwen)
    414.40
    82.9 / $0.20/M
  3. 3
    GPT-4o mini
    OpenAI
    387.40
    58.1 / $0.15/M
  4. 4
    Llama 4 Scout
    Meta
    292.78
    52.7 / $0.18/M
  5. 5
    Llama 4 Maverick
    Meta
    271.93
    73.4 / $0.27/M
  6. 6
    DeepSeek V3
    DeepSeek
    230.22
    62.2 / $0.27/M
  7. 7
    DeepSeek R1
    DeepSeek
    180.18
    99.1 / $0.55/M
  8. 8
    o3-mini
    OpenAI
    78.63
    86.5 / $1.10/M
  9. 9
    Claude 3.5 Haiku
    Anthropic
    77.70
    62.2 / $0.80/M
  10. 10
    Gemini 2.5 Pro
    Google
    76.40
    95.5 / $1.25/M
  11. 11
    Llama 3.3 70B Instruct
    Meta
    72.17
    63.5 / $0.88/M
  12. 12
    Qwen2.5 72B Instruct
    Alibaba (Qwen)
    61.57
    55.4 / $0.90/M
  13. 13
    Gemini 1.5 Pro
    Google
    35.31
    44.1 / $1.25/M
  14. 14
    Grok 2
    xAI
    31.75
    63.5 / $2.00/M
  15. 15
    Llama 3.1 70B Instruct
    Meta
    31.74
    27.9 / $0.88/M
  16. 16
    Claude Sonnet 4
    Anthropic
    31.08
    93.2 / $3.00/M
  17. 17
    Claude 3.5 Sonnet
    Anthropic
    29.13
    87.4 / $3.00/M
  18. 18
    Mistral Large 2
    Mistral
    29.05
    58.1 / $2.00/M
  19. 19
    GPT-4o
    OpenAI
    28.65
    71.6 / $2.50/M
  20. 20
    o1-mini
    OpenAI
    27.18
    81.5 / $3.00/M
  21. 21
    Grok 3
    xAI
    25.83
    77.5 / $3.00/M
  22. 22
    Llama 3.1 405B Instruct
    Meta
    18.92
    66.2 / $3.50/M
  23. 23
    o3
    OpenAI
    9.37
    93.7 / $10.00/M
  24. 24
    Claude Opus 4
    Anthropic
    6.67
    100.0 / $15.00/M
  25. 25
    o1
    OpenAI
    5.44
    81.5 / $15.00/M

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.