Value ranking
Best value on AIME 2024
American Invitational Mathematics Examination 2024 problems. Three-digit integer answers; very hard for non-reasoning models.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1Qwen3 235BAlibaba (Qwen)433.4086.7 / $0.20/M
- 2Gemini 2.0 FlashGoogle308.1030.8 / $0.10/M
- 3Llama 4 ScoutMeta210.6737.9 / $0.18/M
- 4Llama 4 MaverickMeta170.5646.0 / $0.27/M
- 5DeepSeek R1DeepSeek154.1184.8 / $0.55/M
- 6DeepSeek V3DeepSeek144.2238.9 / $0.27/M
- 7o3-miniOpenAI84.7593.2 / $1.10/M
- 8Gemini 2.5 ProGoogle78.8298.5 / $1.25/M
- 9Gemini 1.5 FlashGoogle34.672.6 / $0.08/M
- 10Grok 3xAI33.33100.0 / $3.00/M
- 11Llama 3.3 70B InstructMeta32.4528.6 / $0.88/M
- 12GPT-4o miniOpenAI28.604.3 / $0.15/M
- 13Qwen2.5 72B InstructAlibaba (Qwen)23.3221.0 / $0.90/M
- 14o1-miniOpenAI19.5658.7 / $3.00/M
- 15Llama 3.1 70B InstructMeta15.3913.5 / $0.88/M
- 16Claude Sonnet 4Anthropic12.0436.1 / $3.00/M
- 17Gemini 1.5 ProGoogle11.1013.9 / $1.25/M
- 18o3OpenAI9.5095.0 / $10.00/M
- 19Mistral Large 2Mistral6.9413.9 / $2.00/M
- 20Grok 2xAI6.7713.5 / $2.00/M
- 21Llama 3.1 405B InstructMeta5.9020.6 / $3.50/M
- 22o1OpenAI5.6184.1 / $15.00/M
- 23Claude Opus 4Anthropic5.3379.9 / $15.00/M
- 24Claude 3.5 SonnetAnthropic4.2512.8 / $3.00/M
- 25GPT-4oOpenAI3.939.8 / $2.50/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.