Scenario guide
Best AI models for Multimodal Document Q&A
Answering questions about scanned PDFs, charts, and screenshots. Relies on vision benchmarks plus reasoning.
Rankings use the same scenario weights and cost blending as the interactive leaderboard on AI Model Analyzer. Data is min-max normalised per benchmark; missing scores are skipped without penalty.
- 1Gemini 2.5 ProGoogleScore 85.1Q 95.9In $1.25/M
- 2Claude Opus 4AnthropicScore 73.1Q 91.4In $15.00/M
- 3Gemini 2.0 FlashGoogleScore 69.9Q 64.4In $0.10/M
- 4Llama 4 MaverickMetaScore 68.9Q 67.3In $0.27/M
- 5Claude Sonnet 4AnthropicScore 66.1Q 75.3In $3.00/M
- 6Llama 4 ScoutMetaScore 60.2Q 54.7In $0.18/M
- 7Claude 3.5 SonnetAnthropicScore 50.2Q 55.5In $3.00/M
- 8GPT-4oOpenAIScore 45.5Q 48.5In $2.50/M
- 9Gemini 1.5 ProGoogleScore 44.7Q 44.3In $1.25/M
- 10Gemini 1.5 FlashGoogleScore 34.3Q 18.6In $0.08/M
- 11GPT-4o miniOpenAIScore 34.0Q 21.3In $0.15/M
- 12GPT-5 nanoOpenAIScore 20.0Q 0.0In $0.05/M
- 13GPT-5 miniOpenAIScore 14.2Q 0.0In $0.25/M
- 14Gemini 2.5 FlashGoogleScore 13.4Q 0.0In $0.30/M
- 15Gemini 3 FlashGoogleScore 13.4Q 0.0In $0.30/M