Mistral AI logo

Pixtral Large

Multimodal
Mistral AI

A 124 billion parameter multimodal model built on Mistral Large 2, featuring state-of-the-art image understanding capabilities. Excels at document, chart, and natural image understanding while maintaining high text-only performance. Includes a 123 billion parameter multimodal decoder and a 1 billion parameter image encoder with a 128K context window supporting up to 30 high-resolution images.

Key Specifications

Parameters
124.0B
Context
128.0K
Release Date
November 18, 2024
Average Score
80.5%

Timeline

Key dates in the model's history
Announcement
November 18, 2024
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
124.0B
Training Tokens
-
Knowledge Cutoff
-
Family
-
Fine-tuned from
mistral-large-2-2407
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$2.00
Output (per 1M tokens)
$6.00
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Multimodal

Working with images and visual data
AI2D
BBox Method BBox based on process training with for intermediate computations model. He includes three : 1. language which helps model its intermediate computation. For tasks example language can look how «14 × 15. First 10 × 15 = 150. Then 4 × 15 = 60. we receive 150 + 60 = 210, therefore 14 × 15 = 210». 2. With help RLHF (training with on basis connection from human) model work. For this demonstration or or more model, and use their for settings model with help training with 3. answer from model. answer, model also provides explanation about that, how model to answerSelf-reported
93.8%
ChartQA
Chain reasoning AI: I I will solve this task by in order to to correct answer. First I tasks and that me need to find. Then I task on more components. For each I corresponding mathematical or logical methods. I I will all intermediate computation and its work on each step. I results, in order to obtain final answer on taskSelf-reported
88.1%
DocVQA
ANLS evaluation (ANLS) - this metric evaluation, for evaluation quality extraction information with images or text. ANLS measures between answer model and reference answer, considering possible in For each question and answer ANLS (NLS) between answer and from answers. NLS is determined how general between and reference answers to more from two If NLS below specific (usually 0.5), evaluation that helps which too from answers. metric ANLS is calculated how average value all NLS evaluations by all questions in set data. ANLS especially useful for tasks, where differences in for example, in tasks answers on questions by or in textSelf-reported
93.3%
MathVista
Chain thinking AI: you in complex computations I generate example question MMLU. Please, its, using approach chains thinking. First step for step, and then answerSelf-reported
69.4%
MMMU
CoTSelf-reported
64.0%

Other Tests

Specialized benchmarks
MM-MT-Bench
GPT-4o Judge We we present GPT-4o Judge for evaluation answers on mathematical tasks, which solves problems evaluations and methods. works following manner: 1. Question, answer and solution on GPT-4o with evaluate solution. 2. For with and errors we we use three key : a. First model explicitly verify each step solutions. b. possible errors, which human would find in solving. c. solution for comparison. 3. Model evaluation by scale from 0 to 5, where each score has : • 5: and solution • 4: In whole correct solution with minor • 3: Correct approach with errors • 2: to solving • 1: progress • 0: approach or solutions GPT-4o Judge demonstrates high with experts, achieving 83% evaluationin our tests. This makes its tool for evaluation mathematical solutions, especially for tasks level complexitySelf-reported
74.0%
VQAv2
VQA Match evaluation GPQA, VQA Match, is used for evaluation generated model answers by means of their comparison with reference answers. This process includes three key step: 1. : Answers steps in order to on results comparison, including answers to format. 2. in : answers then in with help model CLIP ViT-L/14, which uses training for creation text. 3. Comparison : score is calculated with help between answer model and answer. (to 1.0) indicates on more exact match between answer and reference. VQA Match evaluates and match, and not simply match, that ensures more evaluation quality answers model. analysis shows, that this method well with evaluationhuman on tasks GPQASelf-reported
80.9%

License & Metadata

License
mistral_research_license_(mrl)_for_research;_mistral_commercial_license_for_commercial_use
Announcement Date
November 18, 2024
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.