AMA

Value ranking

Best value on Terminal-Bench 2

Long-horizon shell-and-filesystem tasks executed in a sandboxed terminal, scored by whether the agent's final state matches a target state. Tests practical tool-using ability for everyday devops and data-wrangling work; one of the hardest agentic benchmarks today.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

  1. 1
    Gemini 3 Flash
    Google
    215.23
    64.6 / $0.30/M
  2. 2
    GPT-5 nano
    OpenAI
    128.60
    6.4 / $0.05/M
  3. 3
    DeepSeek V3
    DeepSeek
    114.00
    30.8 / $0.27/M
  4. 4
    GPT-5 mini
    OpenAI
    96.84
    24.2 / $0.25/M
  5. 5
    Gemini 3 Pro
    Google
    69.06
    86.3 / $1.25/M
  6. 6
    GPT-5.5
    OpenAI
    61.65
    92.5 / $1.50/M
  7. 7
    Kimi K2
    Moonshot (Kimi)
    59.50
    35.7 / $0.60/M
  8. 8
    GPT-5.4
    OpenAI
    59.01
    88.5 / $1.50/M
  9. 9
    GPT-5.2
    OpenAI
    52.31
    65.4 / $1.25/M
  10. 10
    GLM-4.7
    Zhipu AI (GLM)
    44.60
    22.3 / $0.50/M
  11. 11
    GPT-5
    OpenAI
    35.57
    44.5 / $1.25/M
  12. 12
    GPT-5.1
    OpenAI
    33.38
    41.7 / $1.25/M
  13. 13
    Claude Haiku 4.5
    Anthropic
    25.17
    25.2 / $1.00/M
  14. 14
    GLM-4.6
    Zhipu AI (GLM)
    20.24
    10.1 / $0.50/M
  15. 15
    Gemini 2.5 Pro
    Google
    16.96
    21.2 / $1.25/M
  16. 16
    Claude Sonnet 4.6
    Anthropic
    16.55
    49.7 / $3.00/M
  17. 17
    Claude Sonnet 4.5
    Anthropic
    11.72
    35.2 / $3.00/M
  18. 18
    Claude Opus 4.7
    Anthropic
    6.67
    100.0 / $15.00/M
  19. 19
    Claude Opus 4.6
    Anthropic
    5.72
    85.8 / $15.00/M
  20. 20
    Claude Opus 4.5
    Anthropic
    4.20
    62.9 / $15.00/M
  21. 21
    Grok 4
    xAI
    2.76
    13.8 / $5.00/M
  22. 22
    Claude Opus 4
    Anthropic
    1.91
    28.6 / $15.00/M
  23. 23
    Gemini 2.5 Flash
    Google
    0.00
    0.0 / $0.30/M

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.