Value ranking

Best value on AIME 2024

American Invitational Mathematics Examination 2024 problems. Three-digit integer answers; very hard for non-reasoning models.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

1
Qwen3 235B
Alibaba (Qwen)
433.40
86.7 / $0.20/M
2
Gemini 2.0 Flash
Google
308.10
30.8 / $0.10/M
3
Llama 4 Scout
Meta
210.67
37.9 / $0.18/M
4
Llama 4 Maverick
Meta
170.56
46.0 / $0.27/M
5
DeepSeek R1
DeepSeek
154.11
84.8 / $0.55/M
6
DeepSeek V3
DeepSeek
144.22
38.9 / $0.27/M
7
o3-mini
OpenAI
84.75
93.2 / $1.10/M
8
Gemini 2.5 Pro
Google
78.82
98.5 / $1.25/M
9
Gemini 1.5 Flash
Google
34.67
2.6 / $0.08/M
10
Grok 3
xAI
33.33
100.0 / $3.00/M
11
Llama 3.3 70B Instruct
Meta
32.45
28.6 / $0.88/M
12
GPT-4o mini
OpenAI
28.60
4.3 / $0.15/M
13
Qwen2.5 72B Instruct
Alibaba (Qwen)
23.32
21.0 / $0.90/M
14
o1-mini
OpenAI
19.56
58.7 / $3.00/M
15
Llama 3.1 70B Instruct
Meta
15.39
13.5 / $0.88/M
16
Claude Sonnet 4
Anthropic
12.04
36.1 / $3.00/M
17
Gemini 1.5 Pro
Google
11.10
13.9 / $1.25/M
18
o3
OpenAI
9.50
95.0 / $10.00/M
19
Mistral Large 2
Mistral
6.94
13.9 / $2.00/M
20
Grok 2
xAI
6.77
13.5 / $2.00/M
21
Llama 3.1 405B Instruct
Meta
5.90
20.6 / $3.50/M
22
o1
OpenAI
5.61
84.1 / $15.00/M
23
Claude Opus 4
Anthropic
5.33
79.9 / $15.00/M
24
Claude 3.5 Sonnet
Anthropic
4.25
12.8 / $3.00/M
25
GPT-4o
OpenAI
3.93
9.8 / $2.50/M

Open full leaderboard

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.