AMA

Value ranking

Best value on SWE-bench Verified

Real GitHub issues solved end-to-end. Verified subset is a 500-task human-validated slice of SWE-bench.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

  1. 1
    GPT-5 nano
    OpenAI
    648.00
    32.4 / $0.05/M
  2. 2
    DeepSeek V3
    DeepSeek
    318.48
    86.0 / $0.27/M
  3. 3
    Gemini 3 Flash
    Google
    316.07
    94.8 / $0.30/M
  4. 4
    GPT-5 mini
    OpenAI
    281.84
    70.5 / $0.25/M
  5. 5
    Kimi K2
    Moonshot (Kimi)
    146.37
    87.8 / $0.60/M
  6. 6
    GLM-4.6
    Zhipu AI (GLM)
    127.52
    63.8 / $0.50/M
  7. 7
    Claude Haiku 4.5
    Anthropic
    80.82
    80.8 / $1.00/M
  8. 8
    Gemini 3 Pro
    Google
    77.81
    97.3 / $1.25/M
  9. 9
    Gemini 2.5 Flash
    Google
    77.20
    23.2 / $0.30/M
  10. 10
    GPT-5
    OpenAI
    74.15
    92.7 / $1.25/M
  11. 11
    GPT-5.2
    OpenAI
    72.21
    90.3 / $1.25/M
  12. 12
    o4-mini
    OpenAI
    70.70
    77.8 / $1.10/M
  13. 13
    Gemini 2.5 Pro
    Google
    48.82
    61.0 / $1.25/M
  14. 14
    o3-mini
    OpenAI
    39.97
    44.0 / $1.10/M
  15. 15
    Claude Sonnet 4
    Anthropic
    32.12
    96.3 / $3.00/M
  16. 16
    Claude Sonnet 4.5
    Anthropic
    31.10
    93.3 / $3.00/M
  17. 17
    Claude 3.7 Sonnet
    Anthropic
    26.84
    80.5 / $3.00/M
  18. 18
    Claude 3.5 Sonnet
    Anthropic
    25.31
    75.9 / $3.00/M
  19. 19
    GPT-4o
    OpenAI
    15.40
    38.5 / $2.50/M
  20. 20
    Claude Opus 4.5
    Anthropic
    6.67
    100.0 / $15.00/M
  21. 21
    Claude Opus 4.6
    Anthropic
    6.30
    94.5 / $15.00/M
  22. 22
    Claude Opus 4
    Anthropic
    6.06
    90.9 / $15.00/M
  23. 23
    Claude 3 Opus
    Anthropic
    0.23
    3.5 / $15.00/M
  24. 24
    Gemini 2.0 Flash
    Google
    0.00
    0.0 / $0.10/M

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.