QvQ-72B-Preview
MultimodalAn experimental research model focused on advanced visual reasoning capabilities and step-by-step cognitive abilities. Demonstrates high performance on multimodal science and math tasks, though it has some limitations such as potential language mixing and recursive reasoning loops.
Key Specifications
Parameters
73.4B
Context
-
Release Date
December 25, 2024
Average Score
49.5%
Timeline
Key dates in the model's history
Announcement
December 25, 2024
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
73.4B
Training Tokens
-
Knowledge Cutoff
-
Family
-
Fine-tuned from
qwen2-vl-72b
Capabilities
MultimodalZeroEval
Benchmark Results
Model performance metrics across various tests and benchmarks
Multimodal
Working with images and visual data
MathVista
mini • Self-reported
MMMU
Verification values (val) This technique verification intermediate steps in solving, used for that, in order to when errors and on which steps can rely. LLM with accuracy can its computation or methods. verification computations includes: 1. Definition problems, which need to verify. 2. Use or method for verification. 3. Comparison results with solution. 4. Definition and solutions, which result at errors. 5. in solution in with results verification. especially useful, when this step, computations. methods are used, when solution difficult verify or can be more Verification values more by comparison with more how or and suits for specific steps in solving • Self-reported
Other Tests
Specialized benchmarks
MathVision
# to in GPT models for and improvements ## (1) (2) (3) (4) (4) (1) (2) (1) (1) (1) (1) (1) (1) (1) (1) (1) Anthropic (2) Technion, 3200003 (3) CA 94305, (4) in CA 94720, ## language models is to their behavior, but for this often is required access to model, which through API. We we offer FOCUS, new method extraction from LLM, only with and model. FOCUS allows measure "" model, which can how function from through process, "" in queries, which and queries with "", from tokens. We we evaluate FOCUS with models Claude 2 and 3, its efficiency on tasks, including logical knowledge systems and user, understanding in questions answer on questions, and processes decision-making model solutions by Comparison with methods level, including analysis and approaches, demonstrates, that FOCUS can be more exact in specific tasks. We also we show, that main FOCUS uses to and that option our method can internal • Self-reported
OlympiadBench
about testing includes output each query, results each or test and final score for each tasks, set tasks and general score. also includes time execution and computational This most which contains how can more information about performance model. She/It useful at analysis and especially important at performance with other models • Self-reported
License & Metadata
License
qwen
Announcement Date
December 25, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsQwen2-VL-72B-Instruct
Alibaba
MM73.4B
Released:Aug 2024
Qwen2.5 VL 72B Instruct
Alibaba
MM72.0B
Released:Jan 2025
Qwen3 VL 32B Thinking
Alibaba
MM33.0B
Released:Sep 2025
Qwen2.5 VL 32B Instruct
Alibaba
MM33.5B
Best score:0.9 (HumanEval)
Released:Feb 2025
Qwen3.5-397B-A17B
Alibaba
MM397.0B
Released:Feb 2026
Qwen2.5 VL 7B Instruct
Alibaba
MM8.3B
Released:Jan 2025
Qwen2.5-Omni-7B
Alibaba
MM7.0B
Best score:0.8 (HumanEval)
Released:Mar 2025
Qwen3.5 35B A3B
Alibaba
35.0B
Released:Mar 2026
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.