AMA

Value ranking

Best value on LiveCodeBench

Continuously refreshed coding benchmark drawing from LeetCode, AtCoder, and Codeforces; reduces benchmark contamination.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

  1. 1
    Qwen3 235B
    Alibaba (Qwen)
    433.20
    86.6 / $0.20/M
  2. 2
    Gemini 2.5 Flash
    Google
    254.63
    76.4 / $0.30/M
  3. 3
    DeepSeek R1
    DeepSeek
    171.51
    94.3 / $0.55/M
  4. 4
    DeepSeek V3
    DeepSeek
    100.41
    27.1 / $0.27/M
  5. 5
    o4-mini
    OpenAI
    90.91
    100.0 / $1.10/M
  6. 6
    Gemini 2.5 Pro
    Google
    75.31
    94.1 / $1.25/M
  7. 7
    o3-mini
    OpenAI
    61.62
    67.8 / $1.10/M
  8. 8
    Claude Sonnet 4 (Thinking)
    Anthropic
    21.25
    63.8 / $3.00/M
  9. 9
    Claude Sonnet 4
    Anthropic
    15.39
    46.2 / $3.00/M
  10. 10
    o3
    OpenAI
    9.51
    95.1 / $10.00/M
  11. 11
    Claude 3.5 Sonnet
    Anthropic
    8.51
    25.5 / $3.00/M
  12. 12
    Claude Opus 4 (Thinking)
    Anthropic
    4.49
    67.4 / $15.00/M
  13. 13
    Claude Opus 4
    Anthropic
    3.46
    51.9 / $15.00/M
  14. 14
    GPT-4o
    OpenAI
    2.15
    5.4 / $2.50/M
  15. 15
    GPT-4 Turbo
    OpenAI
    0.35
    3.5 / $10.00/M
  16. 16
    GPT-4o mini
    OpenAI
    0.00
    0.0 / $0.15/M

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.