Value ranking
Best value on FrontierMath Tiers 1-3
Mathematical research problems spanning analysis, algebra, combinatorics and number theory. Tiers 1-3 are progressively harder; even frontier reasoning models only solve a small fraction. The hardest publicly reported benchmark for general mathematical reasoning.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1GPT-5 nanoOpenAI320.2016.0 / $0.05/M
- 2Gemini 3 FlashGoogle229.8068.9 / $0.30/M
- 3GPT-5 miniOpenAI210.7652.7 / $0.25/M
- 4Kimi K2Moonshot (Kimi)89.9554.0 / $0.60/M
- 5Qwen3 235B (Thinking)Alibaba (Qwen)82.0016.4 / $0.20/M
- 6GPT-5.5OpenAI66.67100.0 / $1.50/M
- 7GPT-5.2OpenAI62.9878.7 / $1.25/M
- 8GPT-5.4OpenAI61.3892.1 / $1.50/M
- 9Gemini 3 ProGoogle58.1872.7 / $1.25/M
- 10GPT-5OpenAI50.1662.7 / $1.25/M
- 11GPT-5.1OpenAI48.0260.0 / $1.25/M
- 12o4-miniOpenAI43.6548.0 / $1.10/M
- 13Gemini 2.0 FlashGoogle33.303.3 / $0.10/M
- 14Gemini 2.5 FlashGoogle31.239.4 / $0.30/M
- 15Gemini 2.5 ProGoogle21.8827.4 / $1.25/M
- 16o3-miniOpenAI21.8324.0 / $1.10/M
- 17Claude Sonnet 4.6Anthropic20.8962.7 / $3.00/M
- 18GLM-4.6Zhipu AI (GLM)14.787.4 / $0.50/M
- 19DeepSeek V3DeepSeek12.333.3 / $0.27/M
- 20Claude Haiku 4.5Anthropic11.4211.4 / $1.00/M
- 21Claude Sonnet 4.5Anthropic9.8229.4 / $3.00/M
- 22GLM-4.7Zhipu AI (GLM)9.444.7 / $0.50/M
- 23Grok 4xAI7.6038.0 / $5.00/M
- 24Claude Opus 4.7Anthropic5.6584.7 / $15.00/M
- 25Claude Opus 4.6Anthropic5.2578.7 / $15.00/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.