Value ranking
Best value on Format Adherence
How reliably the model produces output in the requested format (JSON schemas, markdown structures, exact-string responses). Pairs well with IFEval but reflects how the deployed API is behaving day to day rather than how a frozen test set scores.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1DeepSeek V3DeepSeek370.37100.0 / $0.27/M
- 2Gemini 2.5 FlashGoogle333.33100.0 / $0.30/M
- 3DeepSeek R1DeepSeek181.82100.0 / $0.55/M
- 4Kimi K2Moonshot (Kimi)166.67100.0 / $0.60/M
- 5GPT-5.2OpenAI80.00100.0 / $1.25/M
- 6Gemini 3 ProGoogle80.00100.0 / $1.25/M
- 7GPT-5.4OpenAI66.67100.0 / $1.50/M
- 8GPT-5.5OpenAI66.67100.0 / $1.50/M
- 9Claude Sonnet 4Anthropic33.33100.0 / $3.00/M
- 10Claude Sonnet 4.5Anthropic33.33100.0 / $3.00/M
- 11Claude Sonnet 4.6Anthropic32.4997.5 / $3.00/M
- 12Grok 4xAI20.00100.0 / $5.00/M
- 13Claude Opus 4.6Anthropic6.67100.0 / $15.00/M
- 14Claude Opus 4.7Anthropic6.67100.0 / $15.00/M
- 15Claude Opus 4Anthropic6.3595.2 / $15.00/M
- 16Claude Opus 4.5Anthropic6.1592.2 / $15.00/M
- 17GLM-4.6Zhipu AI (GLM)0.000.0 / $0.50/M
- 18GLM-4.7Zhipu AI (GLM)0.000.0 / $0.50/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.