Value ranking
Best value on RULER 128k
Long-context retrieval and reasoning suite. We report the 128k token effective-context score.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1Gemini 1.5 FlashGoogle1116.2783.7 / $0.08/M
- 2Gemini 2.0 FlashGoogle953.5095.3 / $0.10/M
- 3GPT-4o miniOpenAI465.1369.8 / $0.15/M
- 4Qwen3 235BAlibaba (Qwen)418.6083.7 / $0.20/M
- 5Llama 4 ScoutMeta400.5072.1 / $0.18/M
- 6Llama 4 MaverickMeta327.3088.4 / $0.27/M
- 7DeepSeek V3DeepSeek292.8579.1 / $0.27/M
- 8DeepSeek R1DeepSeek152.2283.7 / $0.55/M
- 9Claude 3.5 HaikuAnthropic87.2169.8 / $0.80/M
- 10Gemini 2.5 ProGoogle80.00100.0 / $1.25/M
- 11Gemini 1.5 ProGoogle78.1497.7 / $1.25/M
- 12o3-miniOpenAI76.1183.7 / $1.10/M
- 13Llama 3.3 70B InstructMeta66.0758.1 / $0.88/M
- 14Qwen2.5 72B InstructAlibaba (Qwen)64.6058.1 / $0.90/M
- 15Llama 3.1 70B InstructMeta55.5048.8 / $0.88/M
- 16Grok 2xAI34.8869.8 / $2.00/M
- 17GPT-4oOpenAI33.4983.7 / $2.50/M
- 18Claude Sonnet 4Anthropic31.0193.0 / $3.00/M
- 19Grok 3xAI31.0193.0 / $3.00/M
- 20Claude 3.5 SonnetAnthropic29.4688.4 / $3.00/M
- 21o1-miniOpenAI26.3679.1 / $3.00/M
- 22Mistral Large 2Mistral23.2546.5 / $2.00/M
- 23Llama 3.1 405B InstructMeta16.6158.1 / $3.50/M
- 24o3OpenAI9.7797.7 / $10.00/M
- 25Claude Opus 4Anthropic6.5197.7 / $15.00/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.