Scenario guide

Best AI models for Math / Reasoning Tutor

Tutors students through hard math. Reasoning-heavy; cost matters less because depth-of-thought is the value. Weighted toward contamination-resistant problem sets (FrontierMath, OTIS Mock AIME) since AIME 2024 and MATH-500 are now in most training corpora.

Rankings use the same scenario weights and cost blending as the interactive leaderboard on AI Model Analyzer. Data is min-max normalised per benchmark; missing scores are skipped without penalty.

1
GPT-5.5
OpenAI
Score 93.4Q 100.0In $1.50/M
2
GPT-5.4
OpenAI
Score 87.4Q 93.3In $1.50/M
3
Claude Opus 4.7
Anthropic
Score 81.2Q 90.2In $15.00/M
4
GPT-5.2
OpenAI
Score 80.9Q 85.8In $1.25/M
5
DeepSeek R1
DeepSeek
Score 80.1Q 81.9In $0.55/M
6
Gemini 3 Pro
Google
Score 78.8Q 83.5In $1.25/M
7
Claude Opus 4.6
Anthropic
Score 77.1Q 85.7In $15.00/M
8
Gemini 3 Flash
Google
Score 76.8Q 78.4In $0.30/M
9
Qwen3 235B
Alibaba (Qwen)
Score 73.1Q 71.6In $0.20/M
10
GPT-5
OpenAI
Score 70.8Q 74.5In $1.25/M
11
Claude Sonnet 4.6
Anthropic
Score 69.2Q 73.7In $3.00/M
12
Kimi K2
Moonshot (Kimi)
Score 68.7Q 69.5In $0.60/M
13
GPT-5.1
OpenAI
Score 68.5Q 72.0In $1.25/M
14
o1
OpenAI
Score 66.3Q 73.3In $15.00/M
15
GPT-5 mini
OpenAI
Score 66.1Q 66.1In $0.25/M

Open interactive leaderboard Build custom weights Home