Value ranking
Best value on MATH-500
500 high-school competition math problems requiring multi-step solutions. Scored on final-answer correctness.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1Qwen2.5 72B InstructAlibaba (Qwen)111.11100.0 / $0.90/M
- 2Llama 3.3 70B InstructMeta81.8972.1 / $0.88/M
- 3Llama 3.1 70B InstructMeta53.4847.1 / $0.88/M
- 4Mixtral 8x22BMistral0.000.0 / $1.20/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.