AMA

Value ranking

Best value on MMMU

Massive Multi-discipline Multimodal Understanding; college-exam level questions with images across 30+ subjects.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

  1. 1
    Gemini 2.0 Flash
    Google
    582.10
    58.2 / $0.10/M
  2. 2
    Llama 4 Scout
    Meta
    275.72
    49.6 / $0.18/M
  3. 3
    Llama 4 Maverick
    Meta
    239.07
    64.5 / $0.27/M
  4. 4
    GPT-4o mini
    OpenAI
    82.07
    12.3 / $0.15/M
  5. 5
    Gemini 2.5 Pro
    Google
    76.42
    95.5 / $1.25/M
  6. 6
    o3-mini
    OpenAI
    74.29
    81.7 / $1.10/M
  7. 7
    Gemini 1.5 Pro
    Google
    29.26
    36.6 / $1.25/M
  8. 8
    Grok 3
    xAI
    27.24
    81.7 / $3.00/M
  9. 9
    Claude Sonnet 4
    Anthropic
    23.51
    70.5 / $3.00/M
  10. 10
    GPT-4o
    OpenAI
    19.40
    48.5 / $2.50/M
  11. 11
    Grok 2
    xAI
    18.66
    37.3 / $2.00/M
  12. 12
    o1-mini
    OpenAI
    18.53
    55.6 / $3.00/M
  13. 13
    Claude 3.5 Sonnet
    Anthropic
    17.79
    53.4 / $3.00/M
  14. 14
    o3
    OpenAI
    10.00
    100.0 / $10.00/M
  15. 15
    Claude Opus 4
    Anthropic
    5.95
    89.2 / $15.00/M
  16. 16
    o1
    OpenAI
    5.50
    82.5 / $15.00/M
  17. 17
    Claude 3 Opus
    Anthropic
    0.82
    12.3 / $15.00/M
  18. 18
    Gemini 1.5 Flash
    Google
    0.00
    0.0 / $0.08/M

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.