Value ranking
Best value on IFEval
Verifiable instruction-following benchmark; 25 categories of strict formatting / structural directives.
“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.
- 1Llama 3.3 70B InstructMeta113.64100.0 / $0.88/M
- 2Llama 3.1 70B InstructMeta93.0581.9 / $0.88/M
- 3Qwen2.5 72B InstructAlibaba (Qwen)89.1180.2 / $0.90/M
- 4Mixtral 8x22BMistral0.000.0 / $1.20/M
AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.