Value ranking
Best value on MMMU
Massive Multi-discipline Multimodal Understanding; college-exam level questions with images across 30+ subjects.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1Gemini 2.0 FlashGoogle582.1058.2 / $0.10/M
- 2Llama 4 ScoutMeta275.7249.6 / $0.18/M
- 3Llama 4 MaverickMeta239.0764.5 / $0.27/M
- 4GPT-4o miniOpenAI82.0712.3 / $0.15/M
- 5Gemini 2.5 ProGoogle76.4295.5 / $1.25/M
- 6o3-miniOpenAI74.2981.7 / $1.10/M
- 7Gemini 1.5 ProGoogle29.2636.6 / $1.25/M
- 8Grok 3xAI27.2481.7 / $3.00/M
- 9Claude Sonnet 4Anthropic23.5170.5 / $3.00/M
- 10GPT-4oOpenAI19.4048.5 / $2.50/M
- 11Grok 2xAI18.6637.3 / $2.00/M
- 12o1-miniOpenAI18.5355.6 / $3.00/M
- 13Claude 3.5 SonnetAnthropic17.7953.4 / $3.00/M
- 14o3OpenAI10.00100.0 / $10.00/M
- 15Claude Opus 4Anthropic5.9589.2 / $15.00/M
- 16o1OpenAI5.5082.5 / $15.00/M
- 17Claude 3 OpusAnthropic0.8212.3 / $15.00/M
- 18Gemini 1.5 FlashGoogle0.000.0 / $0.08/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.