Value ranking

Best value on MMMU

Massive Multi-discipline Multimodal Understanding; college-exam level questions with images across 30+ subjects.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

1
Gemini 2.0 Flash
Google
582.10
58.2 / $0.10/M
2
Llama 4 Scout
Meta
275.72
49.6 / $0.18/M
3
Llama 4 Maverick
Meta
239.07
64.5 / $0.27/M
4
GPT-4o mini
OpenAI
82.07
12.3 / $0.15/M
5
Gemini 2.5 Pro
Google
76.42
95.5 / $1.25/M
6
o3-mini
OpenAI
74.29
81.7 / $1.10/M
7
Gemini 1.5 Pro
Google
29.26
36.6 / $1.25/M
8
Grok 3
xAI
27.24
81.7 / $3.00/M
9
Claude Sonnet 4
Anthropic
23.51
70.5 / $3.00/M
10
GPT-4o
OpenAI
19.40
48.5 / $2.50/M
11
Grok 2
xAI
18.66
37.3 / $2.00/M
12
o1-mini
OpenAI
18.53
55.6 / $3.00/M
13
Claude 3.5 Sonnet
Anthropic
17.79
53.4 / $3.00/M
14
o3
OpenAI
10.00
100.0 / $10.00/M
15
Claude Opus 4
Anthropic
5.95
89.2 / $15.00/M
16
o1
OpenAI
5.50
82.5 / $15.00/M
17
Claude 3 Opus
Anthropic
0.82
12.3 / $15.00/M
18
Gemini 1.5 Flash
Google
0.00
0.0 / $0.08/M

Open full leaderboard

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.