Value ranking

Best value on Prompt Adherence

How well the generated image matches the textual prompt as evaluated by human raters.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

No models have both pricing and a score for this benchmark yet.

Open full leaderboard

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.