AMA

Value ranking

Best value on Output Stability

How consistent the model's outputs are across repeated runs of the same task. Higher means lower variance, fewer occasional hallucinations under identical inputs. Useful for production loops that need reproducible behaviour.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

  1. 1
    DeepSeek V3
    DeepSeek
    257.26
    69.5 / $0.27/M
  2. 2
    GLM-4.6
    Zhipu AI (GLM)
    200.00
    100.0 / $0.50/M
  3. 3
    GLM-4.7
    Zhipu AI (GLM)
    200.00
    100.0 / $0.50/M
  4. 4
    Gemini 2.5 Flash
    Google
    142.47
    42.7 / $0.30/M
  5. 5
    DeepSeek R1
    DeepSeek
    133.91
    73.7 / $0.55/M
  6. 6
    Kimi K2
    Moonshot (Kimi)
    129.38
    77.6 / $0.60/M
  7. 7
    Gemini 3 Pro
    Google
    70.35
    87.9 / $1.25/M
  8. 8
    GPT-5.2
    OpenAI
    62.10
    77.6 / $1.25/M
  9. 9
    GPT-5.4
    OpenAI
    51.75
    77.6 / $1.50/M
  10. 10
    GPT-5.5
    OpenAI
    51.75
    77.6 / $1.50/M
  11. 11
    Claude Sonnet 4
    Anthropic
    32.52
    97.5 / $3.00/M
  12. 12
    Claude Sonnet 4.6
    Anthropic
    30.25
    90.8 / $3.00/M
  13. 13
    Claude Sonnet 4.5
    Anthropic
    24.17
    72.5 / $3.00/M
  14. 14
    Claude Opus 4
    Anthropic
    6.67
    100.0 / $15.00/M
  15. 15
    Claude Opus 4.6
    Anthropic
    6.67
    100.0 / $15.00/M
  16. 16
    Claude Opus 4.7
    Anthropic
    6.67
    100.0 / $15.00/M
  17. 17
    Claude Opus 4.5
    Anthropic
    3.87
    58.0 / $15.00/M
  18. 18
    Grok 4
    xAI
    0.00
    0.0 / $5.00/M

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.