Alibaba logo

QvQ-72B-Preview

Multimodal
Alibaba

An experimental research model focused on advanced visual reasoning capabilities and step-by-step cognitive abilities. Demonstrates high performance on multimodal science and math tasks, though it has some limitations such as potential language mixing and recursive reasoning loops.

Key Specifications

Parameters
73.4B
Context
-
Release Date
December 25, 2024
Average Score
49.5%

Timeline

Key dates in the model's history
Announcement
December 25, 2024
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
73.4B
Training Tokens
-
Knowledge Cutoff
-
Family
-
Fine-tuned from
qwen2-vl-72b
Capabilities
MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

Multimodal

Working with images and visual data
MathVista
miniSelf-reported
71.4%
MMMU
Verification values (val) This technique verification intermediate steps in solving, used for that, in order to when errors and on which steps can rely. LLM with accuracy can its computation or methods. verification computations includes: 1. Definition problems, which need to verify. 2. Use or method for verification. 3. Comparison results with solution. 4. Definition and solutions, which result at errors. 5. in solution in with results verification. especially useful, when this step, computations. methods are used, when solution difficult verify or can be more Verification values more by comparison with more how or and suits for specific steps in solvingSelf-reported
70.3%

Other Tests

Specialized benchmarks
MathVision
# to in GPT models for and improvements ## (1) (2) (3) (4) (4) (1) (2) (1) (1) (1) (1) (1) (1) (1) (1) (1) Anthropic (2) Technion, 3200003 (3) CA 94305, (4) in CA 94720, ## language models is to their behavior, but for this often is required access to model, which through API. We we offer FOCUS, new method extraction from LLM, only with and model. FOCUS allows measure "" model, which can how function from through process, "" in queries, which and queries with "", from tokens. We we evaluate FOCUS with models Claude 2 and 3, its efficiency on tasks, including logical knowledge systems and user, understanding in questions answer on questions, and processes decision-making model solutions by Comparison with methods level, including analysis and approaches, demonstrates, that FOCUS can be more exact in specific tasks. We also we show, that main FOCUS uses to and that option our method can internalSelf-reported
35.9%
OlympiadBench
about testing includes output each query, results each or test and final score for each tasks, set tasks and general score. also includes time execution and computational This most which contains how can more information about performance model. She/It useful at analysis and especially important at performance with other modelsSelf-reported
20.4%

License & Metadata

License
qwen
Announcement Date
December 25, 2024
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.