Value ranking

Best value on IFEval

Verifiable instruction-following benchmark; 25 categories of strict formatting / structural directives.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

1
Llama 3.3 70B Instruct
Meta
113.64
100.0 / $0.88/M
2
Llama 3.1 70B Instruct
Meta
93.05
81.9 / $0.88/M
3
Qwen2.5 72B Instruct
Alibaba (Qwen)
89.11
80.2 / $0.90/M
4
Mixtral 8x22B
Mistral
0.00
0.0 / $1.20/M

Open full leaderboard

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.