Value ranking
Best value on SWE-bench Verified
Real GitHub issues solved end-to-end. Verified subset is a 500-task human-validated slice of SWE-bench.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1GPT-5 nanoOpenAI648.0032.4 / $0.05/M
- 2DeepSeek V3DeepSeek318.4886.0 / $0.27/M
- 3Gemini 3 FlashGoogle316.0794.8 / $0.30/M
- 4GPT-5 miniOpenAI281.8470.5 / $0.25/M
- 5Kimi K2Moonshot (Kimi)146.3787.8 / $0.60/M
- 6GLM-4.6Zhipu AI (GLM)127.5263.8 / $0.50/M
- 7Claude Haiku 4.5Anthropic80.8280.8 / $1.00/M
- 8Gemini 3 ProGoogle77.8197.3 / $1.25/M
- 9Gemini 2.5 FlashGoogle77.2023.2 / $0.30/M
- 10GPT-5OpenAI74.1592.7 / $1.25/M
- 11GPT-5.2OpenAI72.2190.3 / $1.25/M
- 12o4-miniOpenAI70.7077.8 / $1.10/M
- 13Gemini 2.5 ProGoogle48.8261.0 / $1.25/M
- 14o3-miniOpenAI39.9744.0 / $1.10/M
- 15Claude Sonnet 4Anthropic32.1296.3 / $3.00/M
- 16Claude Sonnet 4.5Anthropic31.1093.3 / $3.00/M
- 17Claude 3.7 SonnetAnthropic26.8480.5 / $3.00/M
- 18Claude 3.5 SonnetAnthropic25.3175.9 / $3.00/M
- 19GPT-4oOpenAI15.4038.5 / $2.50/M
- 20Claude Opus 4.5Anthropic6.67100.0 / $15.00/M
- 21Claude Opus 4.6Anthropic6.3094.5 / $15.00/M
- 22Claude Opus 4Anthropic6.0690.9 / $15.00/M
- 23Claude 3 OpusAnthropic0.233.5 / $15.00/M
- 24Gemini 2.0 FlashGoogle0.000.0 / $0.10/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.