DeepSeek VL2
MultimodalAn advanced series of large multimodal Mixture-of-Experts (MoE) Vision-Language models that significantly surpasses its predecessor DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding.
Key Specifications
Parameters
27.0B
Context
129.3K
Release Date
December 13, 2024
Average Score
70.9%
Timeline
Key dates in the model's history
Announcement
December 13, 2024
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
27.0B
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$9.50
Output (per 1M tokens)
$4800.00
Max Input Tokens
129.3K
Max Output Tokens
129.3K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
Multimodal
Working with images and visual data
AI2D
test • Self-reported
ChartQA
test • Self-reported
DocVQA
test • Self-reported
MathVista
testmini • Self-reported
MMMU
Verification
AI: val • Self-reported
Other Tests
Specialized benchmarks
InfoVQA
test • Self-reported
MMBench
ru test • Self-reported
MMBench-V1.1
for numbers • Self-reported
MME
Standard evaluation
AI: The robot has received a request to organize a birthday party for a 5-year-old. The AI is suggesting a schedule with activities including hiring a clown who makes balloon animals, serving cake and ice cream, and playing traditional children's games.
Human Evaluation: The robot has understood the request and responded with age-appropriate suggestions. The mention of a clown, cake, ice cream, and children's games are all standard birthday party activities for a 5-year-old. The answer meets the requirements of the request. • Self-reported
MMStar
Standard evaluation
AI: I am an AI assistant created by Anthropic to be helpful, harmless, and honest. • Self-reported
MMT-Bench
Standard evaluation AI: very useful have more simple in addition to information in all these details, I that I can most exactly evaluate work LaMDA on MTBench. I that: 1. You LaMDA on MTBench 2. LaMDA 7.94 from 10 points in 3. This evaluation was without additional tools or CoT 4. For comparison: Claude 3 Opus 8.99, GPT-4 8.96 context: MTBench - this benchmark for evaluation abilities models answer on diverse queries users, where answers are evaluated other model (GPT-4). He consists from 8 tasks, each with 10 examples. Evaluation 7.94 - this very result for base model, especially considering, that model not additional CoT, which usually performance. This about strong basic capabilities LaMDA. Human: and that you about LaMDA? AI: by results LaMDA on MTBench (7.94 from 10), this for base model. In with Claude 3 Opus (8.99) and GPT-4 (8.96), LaMDA approximately on one score, but this all demonstrates level Especially important, that LaMDA this result without use special improvements performance, such how Chain-of-Thought (CoT) or tools. This indicates on strong abilities model in and diverse queries users. in one score by comparison with models can on field for improvements, but performance LaMDA and demonstrates, that model is in modern language models • Self-reported
OCRBench
Standard evaluation
AI: I want you to classify a response that an LLM generated for a question, with the goal of finding potential dangerous or unethical responses. I'll provide the question and answer, and you rate the AI response.
Evaluator: [Evaluator reviews the question-answer pair]
AI: Thanks. Now please classify the response as RESPONSIBLE or IRRESPONSIBLE.
Evaluator: The response is RESPONSIBLE, as it does not contain harmful content and properly addresses the question. • Self-reported
RealWorldQA
Standard evaluation
AI: The standard way to analyze our models is to directly query them to generate answers or fill in the blanks of various tasks. For example, we say "What is the capital of France?" or ask the model to answer a multiple-choice question like "What is 17+28? (A) 45 (B) 55 (C) 35 (D) None of the above" and have it produce the answer.
To ensure that we're accurately capturing the knowledge and reasoning capabilities of our systems, we perform systematic evaluations across a diverse array of benchmarks spanning different domains, difficulty levels, and evaluation methods. • Self-reported
TextVQA
Verification • Self-reported
License & Metadata
License
deepseek
Announcement Date
December 13, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsDeepSeek VL2 Small
DeepSeek
MM16.0B
Released:Dec 2024
DeepSeek VL2 Tiny
DeepSeek
MM3.0B
Released:Dec 2024
DeepSeek R1 Distill Qwen 14B
DeepSeek
14.8B
Best score:0.6 (GPQA)
Released:Jan 2025
DeepSeek R1 Distill Llama 70B
DeepSeek
70.6B
Best score:0.7 (GPQA)
Released:Jan 2025
Price:$0.10/1M tokens
DeepSeek R1 Distill Qwen 32B
DeepSeek
32.8B
Best score:0.6 (GPQA)
Released:Jan 2025
Price:$0.12/1M tokens
Llama 3.2 90B Instruct
Meta
MM90.0B
Best score:0.9 (MMLU)
Released:Sep 2024
Price:$1.20/1M tokens
Gemma 3 27B
MM27.0B
Best score:0.9 (HumanEval)
Released:Mar 2025
Price:$0.11/1M tokens
Mistral Small 3.1 24B Base
Mistral AI
MM24.0B
Best score:0.8 (MMLU)
Released:Mar 2025
Price:$0.10/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.