Value ranking

Best value on Rolling Data Analysis

Rolling contamination-controlled data-analysis evaluation. Table comprehension, CSV / spreadsheet reasoning, SQL-style joins, and chart interpretation. Refreshed every six months with new tables and questions to minimise contamination.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

1
Qwen3 235B (Thinking)
Alibaba (Qwen)
384.00
76.8 / $0.20/M
2
DeepSeek V3 (Thinking)
DeepSeek
260.59
70.4 / $0.27/M
3
Gemini 2.5 Pro (Max Thinking)
Google
57.20
71.5 / $1.25/M
4
DeepSeek V3
DeepSeek
33.67
9.1 / $0.27/M
5
Claude Sonnet 4 (Thinking)
Anthropic
33.33
100.0 / $3.00/M
6
Qwen3 235B
Alibaba (Qwen)
30.80
6.2 / $0.20/M
7
Claude Opus 4 (Thinking)
Anthropic
3.10
46.5 / $15.00/M
8
Claude Opus 4
Anthropic
0.83
12.4 / $15.00/M
9
Claude Sonnet 4
Anthropic
0.00
0.0 / $3.00/M

Open full leaderboard

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.