Value ranking
Best value on SimpleQA Verified
A human-validated factuality benchmark of short factual questions whose answers can be checked against a single ground truth. Penalises hallucinations by scoring confidently-wrong answers below abstentions.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1Qwen3 235B (Thinking)Alibaba (Qwen)309.5561.9 / $0.20/M
- 2Gemini 3 FlashGoogle287.1086.1 / $0.30/M
- 3GPT-5 nanoOpenAI176.408.8 / $0.05/M
- 4GPT-5 miniOpenAI84.6021.1 / $0.25/M
- 5Gemini 3 ProGoogle80.00100.0 / $1.25/M
- 6GLM-4.7Zhipu AI (GLM)71.7035.9 / $0.50/M
- 7Kimi K2Moonshot (Kimi)65.3739.2 / $0.60/M
- 8Gemini 2.5 ProGoogle56.1470.2 / $1.25/M
- 9DeepSeek R1DeepSeek55.0030.3 / $0.55/M
- 10GPT-5.5OpenAI53.4180.1 / $1.50/M
- 11GPT-5OpenAI50.0962.6 / $1.25/M
- 12GPT-5.1OpenAI48.1860.2 / $1.25/M
- 13GPT-5.2OpenAI36.9846.2 / $1.25/M
- 14GPT-5.4OpenAI36.3554.5 / $1.50/M
- 15o4-miniOpenAI22.9225.2 / $1.10/M
- 16Grok 4xAI11.7658.8 / $5.00/M
- 17Claude Sonnet 4.6Anthropic10.7832.4 / $3.00/M
- 18Claude Sonnet 4.5Anthropic8.2624.8 / $3.00/M
- 19o3OpenAI6.6066.0 / $10.00/M
- 20Claude Opus 4.7Anthropic4.1762.6 / $15.00/M
- 21Claude Opus 4.6Anthropic3.7956.8 / $15.00/M
- 22Claude Opus 4.5Anthropic3.3550.3 / $15.00/M
- 23Claude Opus 4Anthropic2.7040.5 / $15.00/M
- 24Claude 3.5 HaikuAnthropic1.401.1 / $0.80/M
- 25Claude Haiku 4.5Anthropic0.000.0 / $1.00/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.