AMA

Value ranking

Best value on GPQA Diamond

Graduate-level Google-proof Q&A in physics, chemistry, and biology. Diamond subset is the hardest tier with PhD-validated answers.

“Value” is normalized benchmark score (0–100 for this leaderboard cohort) divided by input price per million tokens. Higher means more capability per dollar on this axis only — always sanity-check latency, context length, and your real workload.

  1. 1
    Qwen2.5 72B Instruct
    Alibaba (Qwen)
    111.11
    100.0 / $0.90/M
  2. 2
    Mixtral 8x22B
    Mistral
    80.30
    96.4 / $1.20/M
  3. 3
    Llama 3.1 70B Instruct
    Meta
    68.18
    60.0 / $0.88/M
  4. 4
    Llama 3.3 70B Instruct
    Meta
    0.00
    0.0 / $0.88/M

AI Model Analyzer does not recommend specific vendors; rankings are derived from public data only.