Value ranking
Best value on ARC-AGI 2
Second-generation ARC challenge testing fluid reasoning over abstract visual puzzles. Resists training-data memorisation by construction: each puzzle is novel and solutions require multi-step pattern induction. Frontier models are only just starting to score above chance on the harder tier.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1Gemini 3 FlashGoogle131.8039.5 / $0.30/M
- 2Gemini 3 ProGoogle72.5790.7 / $1.25/M
- 3GPT-5.5OpenAI66.67100.0 / $1.50/M
- 4GPT-5 nanoOpenAI61.403.1 / $0.05/M
- 5GPT-5.4OpenAI58.0087.0 / $1.50/M
- 6GPT-5.2OpenAI49.8062.3 / $1.25/M
- 7Claude Sonnet 4.6Anthropic23.6971.1 / $3.00/M
- 8Kimi K2Moonshot (Kimi)23.1513.9 / $0.60/M
- 9GPT-5 miniOpenAI20.885.2 / $0.25/M
- 10DeepSeek V3DeepSeek17.564.7 / $0.27/M
- 11GPT-5.1OpenAI16.6020.8 / $1.25/M
- 12Gemini 2.0 FlashGoogle15.301.5 / $0.10/M
- 13GPT-5OpenAI9.2811.6 / $1.25/M
- 14o4-miniOpenAI6.547.2 / $1.10/M
- 15Claude Opus 4.7Anthropic5.9589.2 / $15.00/M
- 16Claude Opus 4.6Anthropic5.4381.4 / $15.00/M
- 17Claude Sonnet 4.5Anthropic5.3416.0 / $3.00/M
- 18Claude Haiku 4.5Anthropic4.744.7 / $1.00/M
- 19Grok 4xAI3.7618.8 / $5.00/M
- 20o3-miniOpenAI3.203.5 / $1.10/M
- 21Claude Opus 4.5Anthropic2.9544.3 / $15.00/M
- 22DeepSeek R1DeepSeek2.781.5 / $0.55/M
- 23o3OpenAI0.777.7 / $10.00/M
- 24Gemini 1.5 ProGoogle0.750.9 / $1.25/M
- 25Claude Opus 4Anthropic0.6810.1 / $15.00/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.