AMA

Head-to-head

GPT-5.5 vs Qwen3 235B

Normalized scores are min-maxed per benchmark across all models we track (0–100). Open the interactive compare view to add benchmarks to the radar chart or pull in more models.

GPT-5.5
OpenAI
Qwen3 235B
Alibaba (Qwen)
BenchmarkGPT-5.5Qwen3 235B
Chatbot Arena Elo

Arena

MMLU Pro

MMLU-Pro

GPQA Diamond

GPQA

MATH-500

MATH

AIME 2024

AIME

HumanEval

HumanEval

LiveCodeBench

LiveCB

IFEval

IFEval

RULER 128k

RULER

Output Speed

Speed

Time to First Token

TTFT

Rolling Contamination-Controlled Average

Rolling Avg

Rolling Data Analysis

Data Analysis

FrontierMath Tiers 1-3

FrontierMath

SimpleQA Verified

SimpleQA

OTIS Mock AIME 2024-2025

OTIS AIME

ARC-AGI 2

ARC-AGI 2

Aider Polyglot

Aider

Terminal-Bench 2

TermBench 2

Frontier Composite

Frontier

Output Stability

Stability

Format Adherence

Format

Recovery Rate

Recovery

Safety Handling

Safety

Methodology matches the main AI Model Analyzer About page.