Value ranking
Best value on Output Stability
How consistent the model's outputs are across repeated runs of the same task. Higher means lower variance, fewer occasional hallucinations under identical inputs. Useful for production loops that need reproducible behaviour.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1DeepSeek V3DeepSeek257.2669.5 / $0.27/M
- 2GLM-4.6Zhipu AI (GLM)200.00100.0 / $0.50/M
- 3GLM-4.7Zhipu AI (GLM)200.00100.0 / $0.50/M
- 4Gemini 2.5 FlashGoogle142.4742.7 / $0.30/M
- 5DeepSeek R1DeepSeek133.9173.7 / $0.55/M
- 6Kimi K2Moonshot (Kimi)129.3877.6 / $0.60/M
- 7Gemini 3 ProGoogle70.3587.9 / $1.25/M
- 8GPT-5.2OpenAI62.1077.6 / $1.25/M
- 9GPT-5.4OpenAI51.7577.6 / $1.50/M
- 10GPT-5.5OpenAI51.7577.6 / $1.50/M
- 11Claude Sonnet 4Anthropic32.5297.5 / $3.00/M
- 12Claude Sonnet 4.6Anthropic30.2590.8 / $3.00/M
- 13Claude Sonnet 4.5Anthropic24.1772.5 / $3.00/M
- 14Claude Opus 4Anthropic6.67100.0 / $15.00/M
- 15Claude Opus 4.6Anthropic6.67100.0 / $15.00/M
- 16Claude Opus 4.7Anthropic6.67100.0 / $15.00/M
- 17Claude Opus 4.5Anthropic3.8758.0 / $15.00/M
- 18Grok 4xAI0.000.0 / $5.00/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.