QvQ-72B-Preview

Name: QvQ-72B-Preview
Author: Alibaba

Multimodal

Alibaba

An experimental research model focused on advanced visual reasoning capabilities and step-by-step cognitive abilities. Demonstrates high performance on multimodal science and math tasks, though it has some limitations such as potential language mixing and recursive reasoning loops.

Key Specifications

Parameters

73.4B

Context

Release Date

December 25, 2024

Average Score

49.5%

API Documentation Repository Model Weights Results Blog

Timeline

Key dates in the model's history

Announcement

December 25, 2024

Last Update

July 19, 2025

Today

May 9, 2026

Technical Specifications

Parameters

73.4B

Training Tokens

Knowledge Cutoff

Family

Fine-tuned from

qwen2-vl-72b

Capabilities

MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

Multimodal

Working with images and visual data

MathVista

mini • Self-reported

71.4%

MMMU

Verification values (val) This technique verification intermediate steps in solving, used for that, in order to when errors and on which steps can rely. LLM with accuracy can its computation or methods. verification computations includes: 1. Definition problems, which need to verify. 2. Use or method for verification. 3. Comparison results with solution. 4. Definition and solutions, which result at errors. 5. in solution in with results verification. especially useful, when this step, computations. methods are used, when solution difficult verify or can be more Verification values more by comparison with more how or and suits for specific steps in solving • Self-reported

70.3%

Other Tests

Specialized benchmarks

MathVision

# to in GPT models for and improvements ## (1) (2) (3) (4) (4) (1) (2) (1) (1) (1) (1) (1) (1) (1) (1) (1) Anthropic (2) Technion, 3200003 (3) CA 94305, (4) in CA 94720, ## language models is to their behavior, but for this often is required access to model, which through API. We we offer FOCUS, new method extraction from LLM, only with and model. FOCUS allows measure "" model, which can how function from through process, "" in queries, which and queries with "", from tokens. We we evaluate FOCUS with models Claude 2 and 3, its efficiency on tasks, including logical knowledge systems and user, understanding in questions answer on questions, and processes decision-making model solutions by Comparison with methods level, including analysis and approaches, demonstrates, that FOCUS can be more exact in specific tasks. We also we show, that main FOCUS uses to and that option our method can internal • Self-reported

35.9%

OlympiadBench

about testing includes output each query, results each or test and final score for each tasks, set tasks and general score. also includes time execution and computational This most which contains how can more information about performance model. She/It useful at analysis and especially important at performance with other models • Self-reported

20.4%

License & Metadata

License

qwen

Announcement Date

December 25, 2024

Last Updated

July 19, 2025

Similar Models

All Models

Qwen2-VL-72B-Instruct

Alibaba

MM73.4B

Released:Aug 2024

Qwen2.5 VL 72B Instruct

Alibaba

MM72.0B

Released:Jan 2025

Qwen3 VL 32B Thinking

Alibaba

MM33.0B

Released:Sep 2025

Qwen2.5 VL 32B Instruct

Alibaba

MM33.5B

Best score:0.9 (HumanEval)

Released:Feb 2025

Qwen3.5-397B-A17B

Alibaba

MM397.0B

Released:Feb 2026

Qwen2.5 VL 7B Instruct

Alibaba

MM8.3B

Released:Jan 2025

Qwen2.5-Omni-7B

Alibaba

MM7.0B

Best score:0.8 (HumanEval)

Released:Mar 2025

Qwen3.5 35B A3B

Alibaba

35.0B

Released:Mar 2026

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.