Scenario guide
Best AI models for Bulk Classification
Classifying / extracting fields from millions of documents. Quality barely matters past a threshold; cost dominates.
Rankings use the same scenario weights and cost blending as the interactive leaderboard on AI Model Analyzer. Data is min-max normalised per benchmark; missing scores are skipped without penalty.
- 1GPT-5 nanoOpenAIScore 82.9Q 51.2In $0.05/M
- 2Gemini 2.0 FlashGoogleScore 81.6Q 62.2In $0.10/M
- 3Qwen3 235B (Thinking)Alibaba (Qwen)Score 81.0Q 81.4In $0.20/M
- 4DeepSeek V3 (Thinking)DeepSeekScore 77.7Q 84.8In $0.27/M
- 5DeepSeek V3DeepSeekScore 77.6Q 84.4In $0.27/M
- 6Gemini 1.5 FlashGoogleScore 77.3Q 40.4In $0.08/M
- 7Gemini 3 FlashGoogleScore 76.0Q 92.3In $0.30/M
- 8Qwen3 235BAlibaba (Qwen)Score 75.6Q 65.9In $0.20/M
- 9GLM-4.6Zhipu AI (GLM)Score 72.8Q 90.1In $0.50/M
- 10Gemini 2.5 FlashGoogleScore 72.6Q 82.6In $0.30/M
- 11GLM-4.7Zhipu AI (GLM)Score 72.0Q 87.8In $0.50/M
- 12GPT-5 miniOpenAIScore 70.1Q 68.5In $0.25/M
- 13DeepSeek R1DeepSeekScore 69.8Q 86.1In $0.55/M
- 14GPT-4o miniOpenAIScore 69.2Q 40.3In $0.15/M
- 15Kimi K2Moonshot (Kimi)Score 67.9Q 84.0In $0.60/M