Value ranking

Best value on MathVista

Math reasoning over visual contexts (charts, figures, geometry).

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

1
Gemini 2.0 Flash
Google
622.60
62.3 / $0.10/M
2
Llama 4 Scout
Meta
309.17
55.6 / $0.18/M
3
Gemini 1.5 Flash
Google
290.13
21.8 / $0.08/M
4
Llama 4 Maverick
Meta
236.70
63.9 / $0.27/M
5
GPT-4o mini
OpenAI
113.87
17.1 / $0.15/M
6
o3-mini
OpenAI
76.38
84.0 / $1.10/M
7
Gemini 2.5 Pro
Google
76.03
95.0 / $1.25/M
8
Gemini 1.5 Pro
Google
29.53
36.9 / $1.25/M
9
Grok 2
xAI
25.48
51.0 / $2.00/M
10
Claude Sonnet 4
Anthropic
25.25
75.8 / $3.00/M
11
Grok 3
xAI
22.50
67.5 / $3.00/M
12
o1-mini
OpenAI
17.91
53.7 / $3.00/M
13
Claude 3.5 Sonnet
Anthropic
15.79
47.4 / $3.00/M
14
GPT-4o
OpenAI
14.66
36.6 / $2.50/M
15
o3
OpenAI
10.00
100.0 / $10.00/M
16
Claude Opus 4
Anthropic
6.15
92.3 / $15.00/M
17
o1
OpenAI
4.30
64.5 / $15.00/M
18
Claude 3 Opus
Anthropic
0.00
0.0 / $15.00/M

Open full leaderboard

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.