AMA

Value ranking

Best value on Format Adherence

How reliably the model produces output in the requested format (JSON schemas, markdown structures, exact-string responses). Pairs well with IFEval but reflects how the deployed API is behaving day to day rather than how a frozen test set scores.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

  1. 1
    DeepSeek V3
    DeepSeek
    370.37
    100.0 / $0.27/M
  2. 2
    Gemini 2.5 Flash
    Google
    333.33
    100.0 / $0.30/M
  3. 3
    DeepSeek R1
    DeepSeek
    181.82
    100.0 / $0.55/M
  4. 4
    Kimi K2
    Moonshot (Kimi)
    166.67
    100.0 / $0.60/M
  5. 5
    GPT-5.2
    OpenAI
    80.00
    100.0 / $1.25/M
  6. 6
    Gemini 3 Pro
    Google
    80.00
    100.0 / $1.25/M
  7. 7
    GPT-5.4
    OpenAI
    66.67
    100.0 / $1.50/M
  8. 8
    GPT-5.5
    OpenAI
    66.67
    100.0 / $1.50/M
  9. 9
    Claude Sonnet 4
    Anthropic
    33.33
    100.0 / $3.00/M
  10. 10
    Claude Sonnet 4.5
    Anthropic
    33.33
    100.0 / $3.00/M
  11. 11
    Claude Sonnet 4.6
    Anthropic
    32.49
    97.5 / $3.00/M
  12. 12
    Grok 4
    xAI
    20.00
    100.0 / $5.00/M
  13. 13
    Claude Opus 4.6
    Anthropic
    6.67
    100.0 / $15.00/M
  14. 14
    Claude Opus 4.7
    Anthropic
    6.67
    100.0 / $15.00/M
  15. 15
    Claude Opus 4
    Anthropic
    6.35
    95.2 / $15.00/M
  16. 16
    Claude Opus 4.5
    Anthropic
    6.15
    92.2 / $15.00/M
  17. 17
    GLM-4.6
    Zhipu AI (GLM)
    0.00
    0.0 / $0.50/M
  18. 18
    GLM-4.7
    Zhipu AI (GLM)
    0.00
    0.0 / $0.50/M

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.