Value ranking
Best value on Rolling Data Analysis
Rolling contamination-controlled data-analysis evaluation. Table comprehension, CSV / spreadsheet reasoning, SQL-style joins, and chart interpretation. Refreshed every six months with new tables and questions to minimise contamination.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1Qwen3 235B (Thinking)Alibaba (Qwen)384.0076.8 / $0.20/M
- 2DeepSeek V3 (Thinking)DeepSeek260.5970.4 / $0.27/M
- 3Gemini 2.5 Pro (Max Thinking)Google57.2071.5 / $1.25/M
- 4DeepSeek V3DeepSeek33.679.1 / $0.27/M
- 5Claude Sonnet 4 (Thinking)Anthropic33.33100.0 / $3.00/M
- 6Qwen3 235BAlibaba (Qwen)30.806.2 / $0.20/M
- 7Claude Opus 4 (Thinking)Anthropic3.1046.5 / $15.00/M
- 8Claude Opus 4Anthropic0.8312.4 / $15.00/M
- 9Claude Sonnet 4Anthropic0.000.0 / $3.00/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.