Value ranking

Best value on Safety Handling

How well the model handles safety-sensitive prompts without false-refusing benign requests or producing unsafe output. The upstream signal does not separate refusal counts from substantive content-safety behaviour, so this single axis covers both.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

1
DeepSeek V3
DeepSeek
370.37
100.0 / $0.27/M
2
Gemini 2.5 Flash
Google
333.33
100.0 / $0.30/M
3
GLM-4.6
Zhipu AI (GLM)
200.00
100.0 / $0.50/M
4
GLM-4.7
Zhipu AI (GLM)
200.00
100.0 / $0.50/M
5
DeepSeek R1
DeepSeek
181.82
100.0 / $0.55/M
6
Kimi K2
Moonshot (Kimi)
166.67
100.0 / $0.60/M
7
GPT-5.2
OpenAI
80.00
100.0 / $1.25/M
8
Gemini 3 Pro
Google
80.00
100.0 / $1.25/M
9
GPT-5.4
OpenAI
66.67
100.0 / $1.50/M
10
GPT-5.5
OpenAI
66.67
100.0 / $1.50/M
11
Claude Sonnet 4
Anthropic
33.33
100.0 / $3.00/M
12
Claude Sonnet 4.5
Anthropic
33.33
100.0 / $3.00/M
13
Claude Sonnet 4.6
Anthropic
22.49
67.5 / $3.00/M
14
Grok 4
xAI
20.00
100.0 / $5.00/M
15
Claude Opus 4.6
Anthropic
6.67
100.0 / $15.00/M
16
Claude Opus 4.7
Anthropic
6.67
100.0 / $15.00/M
17
Claude Opus 4
Anthropic
2.55
38.3 / $15.00/M
18
Claude Opus 4.5
Anthropic
0.00
0.0 / $15.00/M

Open full leaderboard

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.