AMA

Value ranking

Best value on Rolling Data Analysis

Rolling contamination-controlled data-analysis evaluation. Table comprehension, CSV / spreadsheet reasoning, SQL-style joins, and chart interpretation. Refreshed every six months with new tables and questions to minimise contamination.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

  1. 1
    Qwen3 235B (Thinking)
    Alibaba (Qwen)
    384.00
    76.8 / $0.20/M
  2. 2
    DeepSeek V3 (Thinking)
    DeepSeek
    260.59
    70.4 / $0.27/M
  3. 3
    Gemini 2.5 Pro (Max Thinking)
    Google
    57.20
    71.5 / $1.25/M
  4. 4
    DeepSeek V3
    DeepSeek
    33.67
    9.1 / $0.27/M
  5. 5
    Claude Sonnet 4 (Thinking)
    Anthropic
    33.33
    100.0 / $3.00/M
  6. 6
    Qwen3 235B
    Alibaba (Qwen)
    30.80
    6.2 / $0.20/M
  7. 7
    Claude Opus 4 (Thinking)
    Anthropic
    3.10
    46.5 / $15.00/M
  8. 8
    Claude Opus 4
    Anthropic
    0.83
    12.4 / $15.00/M
  9. 9
    Claude Sonnet 4
    Anthropic
    0.00
    0.0 / $3.00/M

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.