Scenario guide
Best AI models for Bulk Classification
Classifying / extracting fields from millions of documents. Quality barely matters past a threshold; cost dominates.
Rankings use the same scenario weights and cost blending as the interactive leaderboard on AI Model Analyzer. Data is min-max normalised per benchmark; missing scores are skipped without penalty.
- 1GPT-5 nanoOpenAIScore 81.5Q 47.2In $0.05/M
- 2Gemini 2.0 FlashGoogleScore 79.9Q 57.4In $0.10/M
- 3Qwen3 235B (Thinking)Alibaba (Qwen)Score 78.8Q 75.2In $0.20/M
- 4Gemini 1.5 FlashGoogleScore 76.2Q 37.3In $0.08/M
- 5Gemini 3 FlashGoogleScore 75.6Q 91.1In $0.30/M
- 6DeepSeek V3 (Thinking)DeepSeekScore 75.4Q 78.2In $0.27/M
- 7DeepSeek V3DeepSeekScore 75.4Q 78.1In $0.27/M
- 8Qwen3 235BAlibaba (Qwen)Score 73.8Q 60.9In $0.20/M
- 9Gemini 2.5 FlashGoogleScore 70.5Q 76.5In $0.30/M
- 10GLM-4.6Zhipu AI (GLM)Score 70.4Q 83.2In $0.50/M
- 11GLM-4.7Zhipu AI (GLM)Score 69.9Q 81.9In $0.50/M
- 12GPT-5 miniOpenAIScore 68.2Q 63.2In $0.25/M
- 13GPT-4o miniOpenAIScore 68.1Q 37.2In $0.15/M
- 14DeepSeek R1DeepSeekScore 67.5Q 79.4In $0.55/M
- 15Llama 4 ScoutMetaScore 65.9Q 35.5In $0.18/M