AMA

Scenario guide

Best AI models for Math / Reasoning Tutor

Tutors students through hard math. Reasoning-heavy; cost matters less because depth-of-thought is the value. Weighted toward contamination-resistant problem sets (FrontierMath, OTIS Mock AIME) since AIME 2024 and MATH-500 are now in most training corpora.

Rankings use the same scenario weights and cost blending as the interactive leaderboard on AI Model Analyzer. Data is min-max normalised per benchmark; missing scores are skipped without penalty.

  1. 1
    GPT-5.5
    OpenAI
    Score 93.4Q 100.0In $1.50/M
  2. 2
    GPT-5.4
    OpenAI
    Score 87.7Q 93.6In $1.50/M
  3. 3
    Claude Opus 4.7
    Anthropic
    Score 81.7Q 90.7In $15.00/M
  4. 4
    GPT-5.2
    OpenAI
    Score 81.2Q 86.1In $1.25/M
  5. 5
    DeepSeek R1
    DeepSeek
    Score 80.5Q 82.4In $0.55/M
  6. 6
    Gemini 3 Pro
    Google
    Score 79.7Q 84.4In $1.25/M
  7. 7
    Claude Opus 4.6
    Anthropic
    Score 77.6Q 86.3In $15.00/M
  8. 8
    Gemini 3 Flash
    Google
    Score 77.2Q 78.8In $0.30/M
  9. 9
    Qwen3 235B
    Alibaba (Qwen)
    Score 73.6Q 72.1In $0.20/M
  10. 10
    GPT-5
    OpenAI
    Score 71.0Q 74.8In $1.25/M
  11. 11
    Claude Sonnet 4.6
    Anthropic
    Score 69.7Q 74.2In $3.00/M
  12. 12
    Kimi K2
    Moonshot (Kimi)
    Score 69.1Q 69.9In $0.60/M
  13. 13
    GPT-5.1
    OpenAI
    Score 68.8Q 72.3In $1.25/M
  14. 14
    o1
    OpenAI
    Score 67.0Q 74.1In $15.00/M
  15. 15
    DeepSeek V3 (Thinking)
    DeepSeek
    Score 66.3Q 65.1In $0.27/M