Qwen2-VL-72B-Instruct

Name: Qwen2-VL-72B-Instruct
Author: Alibaba

Multimodal

Alibaba

An instruction-tuned large multimodal model that excels at visual understanding and step-by-step reasoning. It supports image and video input with dynamic resolution processing and improved positional embeddings (M-ROPE), enabling advanced capabilities such as complex problem-solving, multilingual text recognition in images, and agentic interaction in video contexts.

Key Specifications

Parameters

73.4B

Context

Release Date

August 29, 2024

Average Score

75.8%

API Documentation Research Paper Repository Model Weights Results Blog

Timeline

Key dates in the model's history

Announcement

August 29, 2024

Last Update

July 19, 2025

Today

July 7, 2026

Technical Specifications

Parameters

73.4B

Training Tokens

Knowledge Cutoff

June 30, 2023

Family

Capabilities

MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

Multimodal

Working with images and visual data

ChartQA

## Evaluation AI: you in order to I work or evaluation by scale. need to more information about that, that specifically you in order to I and by criteria • Self-reported

88.3%

Other Tests

Specialized benchmarks

DocVQAtest

score • Self-reported

96.5%

EgoSchema

score • Self-reported

77.9%

InfoVQAtest

Evaluation • Self-reported

84.5%

MathVista-Mini

score • Self-reported

70.5%

MMBench_test

# Evaluation Evaluation indicates on then, how well well model solves problem or task. We we provide several by which can evaluate quality solutions model: 1. **correctness**: whether answer model correct ()? In some cases model can obtain points, even if she/it uses other method solutions, than solution - in other cases, model should follow (such how verification, manner). 2. ****: whether model task fully, or only her/its part? whether she/it all possible cases or only some from them? 3. **Efficiency**: whether approach model to solving tasks ? whether model steps? 4. ****: whether solution model and for understanding? model will more score, if she/it not its steps or not verifies its work, when this necessary. model more high evaluation, if her/its solution on errors, and she/it its approach manner • Self-reported

86.5%

MMMU-Pro

score • Self-reported

46.2%

MMMUval

score • Self-reported

64.5%

MMVetGPT4Turbo

score • Self-reported

74.0%

MTVQA

score • Self-reported

30.9%

MVBench

score • Self-reported

73.6%

OCRBench

Evaluation AI: ChatGPT 4o • Self-reported

87.7%

RealWorldQA

score • Self-reported

77.8%

TextVQA

score • Self-reported

85.5%

VCR_en_easy

Evaluation AI: ChatGPT (GPT-4) • Self-reported

91.9%

License & Metadata

License

tongyi_qianwen

Announcement Date

August 29, 2024

Last Updated

July 19, 2025

Similar Models

All Models

Qwen3 VL 32B Thinking

Alibaba

MM33.0B

Released:Sep 2025

Qwen2.5 VL 72B Instruct

Alibaba

MM72.0B

Released:Jan 2025

Qwen2.5 VL 32B Instruct

Alibaba

MM33.5B

Best score:0.9 (HumanEval)

Released:Feb 2025

QvQ-72B-Preview

Alibaba

MM73.4B

Released:Dec 2024

Qwen3.5-397B-A17B

Alibaba

MM397.0B

Released:Feb 2026

Qwen2.5 VL 7B Instruct

Alibaba

MM8.3B

Released:Jan 2025

Qwen2.5-Omni-7B

Alibaba

MM7.0B

Best score:0.8 (HumanEval)

Released:Mar 2025

Qwen3.5 35B A3B

Alibaba

35.0B

Released:Mar 2026

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.