o1

Name: o1
Brand: OpenAI
Price: 15 USD
Rating: 69.8 (9 reviews)

Closed

OpenAI

Proprietary

text

o-seriesReleased 1y ago

Avg score

69.8

/ 100

Context

200k

Output limit

100k

Input price

$15.00 /M

Output price

$60.00 /M

Pricing verified 1y ago

Benchmarks

preference

Chatbot Arena EloFresh

Elo

Crowdsourced pairwise human preference rankings of LLM responses. Higher Elo means more frequently preferred by users.

math

AIME 2024High risk

American Invitational Mathematics Examination 2024 problems. Three-digit integer answers; very hard for non-reasoning models.

coding

HumanEvalSaturated

% pass@1

164 hand-written Python programming problems scored by passing unit tests. Saturated for frontier models.

vision

MMMUSome risk

Massive Multi-discipline Multimodal Understanding; college-exam level questions with images across 30+ subjects.

MathVistaSome risk

Math reasoning over visual contexts (charts, figures, geometry).

long context

RULER 128kFresh

Long-context retrieval and reasoning suite. We report the 128k token effective-context score.

performance

Output SpeedN/A

tok/s

Median sustained output speed in tokens per second on the model's first-party API for medium-length prompts. Higher is faster.

Time to First TokenN/A

Median time from request to first output chunk in milliseconds on the model's first-party API for medium-length prompts. Lower is snappier; reasoning models are penalised here because they think before talking.

composite

Frontier CompositeFresh

ECI

Saturation-resistant composite capability score stitched together from ~40 underlying benchmarks using Item Response Theory. Each benchmark is weighted by its fitted difficulty and discriminative slope, so doing well on hard, contamination-resistant evals (FrontierMath, ARC-AGI 2, Humanity's Last Exam) moves the score and saturated benchmarks contribute almost nothing. Imported per-model from Epoch AI's published index; we anchor it to the same min-max scale we use for every other benchmark so it's directly weightable in scenarios.

Reliability monitor

Loading drift signal…

Hosted endpoints

Host	Input $/M	Output $/M	Context	Quant
Host B	$15.00	$60.00	200k	unknown

Anonymised third-party hosts. Sorted by lowest output price.

Compare with...

vs GPT-4o vs GPT-4o mini vs o1-mini vs o3 vs o4-mini vs o3-mini vs GPT-4 Turbo vs GPT-4.1 vs GPT-5 vs GPT-5 mini