Value ranking
Best value on Safety Handling
How well the model handles safety-sensitive prompts without false-refusing benign requests or producing unsafe output. The upstream signal does not separate refusal counts from substantive content-safety behaviour, so this single axis covers both.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1DeepSeek V3DeepSeek370.37100.0 / $0.27/M
- 2Gemini 2.5 FlashGoogle333.33100.0 / $0.30/M
- 3GLM-4.6Zhipu AI (GLM)200.00100.0 / $0.50/M
- 4GLM-4.7Zhipu AI (GLM)200.00100.0 / $0.50/M
- 5DeepSeek R1DeepSeek181.82100.0 / $0.55/M
- 6Kimi K2Moonshot (Kimi)166.67100.0 / $0.60/M
- 7GPT-5.2OpenAI80.00100.0 / $1.25/M
- 8Gemini 3 ProGoogle80.00100.0 / $1.25/M
- 9GPT-5.4OpenAI66.67100.0 / $1.50/M
- 10GPT-5.5OpenAI66.67100.0 / $1.50/M
- 11Claude Sonnet 4Anthropic33.33100.0 / $3.00/M
- 12Claude Sonnet 4.5Anthropic33.33100.0 / $3.00/M
- 13Claude Sonnet 4.6Anthropic22.4967.5 / $3.00/M
- 14Grok 4xAI20.00100.0 / $5.00/M
- 15Claude Opus 4.6Anthropic6.67100.0 / $15.00/M
- 16Claude Opus 4.7Anthropic6.67100.0 / $15.00/M
- 17Claude Opus 4Anthropic2.5538.3 / $15.00/M
- 18Claude Opus 4.5Anthropic0.000.0 / $15.00/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.