Alibaba logo

Qwen2-VL-72B-Instruct

Multimodal
Alibaba

An instruction-tuned large multimodal model that excels at visual understanding and step-by-step reasoning. It supports image and video input with dynamic resolution processing and improved positional embeddings (M-ROPE), enabling advanced capabilities such as complex problem-solving, multilingual text recognition in images, and agentic interaction in video contexts.

Key Specifications

Parameters
73.4B
Context
-
Release Date
August 29, 2024
Average Score
75.8%

Timeline

Key dates in the model's history
Announcement
August 29, 2024
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
73.4B
Training Tokens
-
Knowledge Cutoff
June 30, 2023
Family
-
Capabilities
MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

Multimodal

Working with images and visual data
ChartQA
## Evaluation AI: you in order to I work or evaluation by scale. need to more information about that, that specifically you in order to I and by criteriaSelf-reported
88.3%

Other Tests

Specialized benchmarks
DocVQAtest
scoreSelf-reported
96.5%
EgoSchema
scoreSelf-reported
77.9%
InfoVQAtest
EvaluationSelf-reported
84.5%
MathVista-Mini
scoreSelf-reported
70.5%
MMBench_test
# Evaluation Evaluation indicates on then, how well well model solves problem or task. We we provide several by which can evaluate quality solutions model: 1. **correctness**: whether answer model correct ()? In some cases model can obtain points, even if she/it uses other method solutions, than solution - in other cases, model should follow (such how verification, manner). 2. ****: whether model task fully, or only her/its part? whether she/it all possible cases or only some from them? 3. **Efficiency**: whether approach model to solving tasks ? whether model steps? 4. ****: whether solution model and for understanding? model will more score, if she/it not its steps or not verifies its work, when this necessary. model more high evaluation, if her/its solution on errors, and she/it its approach mannerSelf-reported
86.5%
MMMU-Pro
scoreSelf-reported
46.2%
MMMUval
scoreSelf-reported
64.5%
MMVetGPT4Turbo
scoreSelf-reported
74.0%
MTVQA
scoreSelf-reported
30.9%
MVBench
scoreSelf-reported
73.6%
OCRBench
Evaluation AI: ChatGPT 4oSelf-reported
87.7%
RealWorldQA
scoreSelf-reported
77.8%
TextVQA
scoreSelf-reported
85.5%
VCR_en_easy
Evaluation AI: ChatGPT (GPT-4)Self-reported
91.9%

License & Metadata

License
tongyi_qianwen
Announcement Date
August 29, 2024
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.