Scenario guide

Best AI models for Bulk Classification

Classifying / extracting fields from millions of documents. Quality barely matters past a threshold; cost dominates.

Rankings use the same scenario weights and cost blending as the interactive leaderboard on AI Model Analyzer. Data is min-max normalised per benchmark; missing scores are skipped without penalty.

1
GPT-5 nano
OpenAI
Score 82.9Q 51.2In $0.05/M
2
Gemini 2.0 Flash
Google
Score 81.6Q 62.2In $0.10/M
3
Qwen3 235B (Thinking)
Alibaba (Qwen)
Score 81.0Q 81.4In $0.20/M
4
DeepSeek V3 (Thinking)
DeepSeek
Score 77.7Q 84.8In $0.27/M
5
DeepSeek V3
DeepSeek
Score 77.6Q 84.4In $0.27/M
6
Gemini 1.5 Flash
Google
Score 77.3Q 40.4In $0.08/M
7
Gemini 3 Flash
Google
Score 76.0Q 92.3In $0.30/M
8
Qwen3 235B
Alibaba (Qwen)
Score 75.6Q 65.9In $0.20/M
9
GLM-4.6
Zhipu AI (GLM)
Score 72.8Q 90.1In $0.50/M
10
Gemini 2.5 Flash
Google
Score 72.6Q 82.6In $0.30/M
11
GLM-4.7
Zhipu AI (GLM)
Score 72.0Q 87.8In $0.50/M
12
GPT-5 mini
OpenAI
Score 70.1Q 68.5In $0.25/M
13
DeepSeek R1
DeepSeek
Score 69.8Q 86.1In $0.55/M
14
GPT-4o mini
OpenAI
Score 69.2Q 40.3In $0.15/M
15
Kimi K2
Moonshot (Kimi)
Score 67.9Q 84.0In $0.60/M

Open interactive leaderboard Build custom weights Home