Scenario guide
Best AI models for Math / Reasoning Tutor
Tutors students through hard math. Reasoning-heavy; cost matters less because depth-of-thought is the value. Weighted toward contamination-resistant problem sets (FrontierMath, OTIS Mock AIME) since AIME 2024 and MATH-500 are now in most training corpora.
Rankings use the same scenario weights and cost blending as the interactive leaderboard on AI Model Analyzer. Data is min-max normalised per benchmark; missing scores are skipped without penalty.
- 1GPT-5.5OpenAIScore 93.4Q 100.0In $1.50/M
- 2GPT-5.4OpenAIScore 87.7Q 93.6In $1.50/M
- 3Claude Opus 4.7AnthropicScore 81.7Q 90.7In $15.00/M
- 4GPT-5.2OpenAIScore 81.2Q 86.1In $1.25/M
- 5DeepSeek R1DeepSeekScore 80.5Q 82.4In $0.55/M
- 6Gemini 3 ProGoogleScore 79.7Q 84.4In $1.25/M
- 7Claude Opus 4.6AnthropicScore 77.6Q 86.3In $15.00/M
- 8Gemini 3 FlashGoogleScore 77.2Q 78.8In $0.30/M
- 9Qwen3 235BAlibaba (Qwen)Score 73.6Q 72.1In $0.20/M
- 10GPT-5OpenAIScore 71.0Q 74.8In $1.25/M
- 11Claude Sonnet 4.6AnthropicScore 69.7Q 74.2In $3.00/M
- 12Kimi K2Moonshot (Kimi)Score 69.1Q 69.9In $0.60/M
- 13GPT-5.1OpenAIScore 68.8Q 72.3In $1.25/M
- 14o1OpenAIScore 67.0Q 74.1In $15.00/M
- 15DeepSeek V3 (Thinking)DeepSeekScore 66.3Q 65.1In $0.27/M