Value ranking
Best value on Frontier Composite
Saturation-resistant composite capability score stitched together from ~40 underlying benchmarks using Item Response Theory. Each benchmark is weighted by its fitted difficulty and discriminative slope, so doing well on hard, contamination-resistant evals (FrontierMath, ARC-AGI 2, Humanity's Last Exam) moves the score and saturated benchmarks contribute almost nothing. Imported per-model from Epoch AI's published index; we anchor it to the same min-max scale we use for every other benchmark so it's directly weightable in scenarios.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1GPT-5 nanoOpenAI1052.8052.6 / $0.05/M
- 2Gemini 2.0 FlashGoogle398.0039.8 / $0.10/M
- 3Gemini 1.5 FlashGoogle333.2025.0 / $0.08/M
- 4Qwen3 235B (Thinking)Alibaba (Qwen)331.6566.3 / $0.20/M
- 5Gemini 3 FlashGoogle269.9381.0 / $0.30/M
- 6GPT-5 miniOpenAI265.2866.3 / $0.25/M
- 7DeepSeek V3DeepSeek255.4469.0 / $0.27/M
- 8Qwen3 235BAlibaba (Qwen)251.5050.3 / $0.20/M
- 9DeepSeek V3 (Thinking)DeepSeek241.2265.1 / $0.27/M
- 10Gemini 2.5 FlashGoogle201.1060.3 / $0.30/M
- 11Llama 4 ScoutMeta141.1725.4 / $0.18/M
- 12GLM-4.7Zhipu AI (GLM)127.2663.6 / $0.50/M
- 13DeepSeek R1DeepSeek125.4069.0 / $0.55/M
- 14Kimi K2Moonshot (Kimi)122.7073.6 / $0.60/M
- 15Llama 4 MaverickMeta119.5232.3 / $0.27/M
- 16GLM-4.6Zhipu AI (GLM)111.6055.8 / $0.50/M
- 17GPT-4o miniOpenAI105.2015.8 / $0.15/M
- 18Gemini 3 ProGoogle77.8097.3 / $1.25/M
- 19GPT-5.2OpenAI70.9488.7 / $1.25/M
- 20GPT-5.5OpenAI66.67100.0 / $1.50/M
- 21GPT-5.4OpenAI63.5895.4 / $1.50/M
- 22o4-miniOpenAI63.5569.9 / $1.10/M
- 23GPT-5OpenAI62.7678.5 / $1.25/M
- 24GPT-5.1OpenAI62.1077.6 / $1.25/M
- 25Claude Haiku 4.5Anthropic59.7059.7 / $1.00/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.