Value ranking
Best value on OTIS Mock AIME 2024-2025
AIME-style competition problems written specifically for the OTIS mock contest, then run as an evaluation by Epoch AI. Closer in spirit to the public AIME but with novel problems unlikely to appear in training data.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1GPT-5 nanoOpenAI1608.0080.4 / $0.05/M
- 2Qwen3 235B (Thinking)Alibaba (Qwen)430.8586.2 / $0.20/M
- 3GPT-5 miniOpenAI344.6886.2 / $0.25/M
- 4Gemini 3 FlashGoogle308.3792.5 / $0.30/M
- 5Gemini 2.0 FlashGoogle285.3028.5 / $0.10/M
- 6Gemini 1.5 FlashGoogle174.8013.1 / $0.08/M
- 7GLM-4.7Zhipu AI (GLM)165.4282.7 / $0.50/M
- 8DeepSeek R1DeepSeek158.8487.4 / $0.55/M
- 9Kimi K2Moonshot (Kimi)153.1891.9 / $0.60/M
- 10DeepSeek V3DeepSeek131.3035.5 / $0.27/M
- 11GPT-5.2OpenAI76.7896.0 / $1.25/M
- 12Gemini 3 ProGoogle76.3595.4 / $1.25/M
- 13o4-miniOpenAI73.6281.0 / $1.10/M
- 14GPT-5OpenAI72.8691.1 / $1.25/M
- 15GPT-5.1OpenAI70.5488.2 / $1.25/M
- 16o3-miniOpenAI69.1676.1 / $1.10/M
- 17Gemini 2.5 ProGoogle66.8683.6 / $1.25/M
- 18GPT-5.5OpenAI66.67100.0 / $1.50/M
- 19Claude Haiku 4.5Anthropic65.4265.4 / $1.00/M
- 20GPT-5.4OpenAI63.4195.1 / $1.50/M
- 21Claude Sonnet 4.6Anthropic28.4285.3 / $3.00/M
- 22Claude Sonnet 4.5Anthropic25.6577.0 / $3.00/M
- 23Llama 4 ScoutMeta24.004.3 / $0.18/M
- 24GPT-4o miniOpenAI23.073.5 / $0.15/M
- 25Claude 3.7 SonnetAnthropic18.7356.2 / $3.00/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.