Pixtral Large

Name: Pixtral Large
Author: Mistral AI

Multimodal

Mistral AI

A 124 billion parameter multimodal model built on Mistral Large 2, featuring state-of-the-art image understanding capabilities. Excels at document, chart, and natural image understanding while maintaining high text-only performance. Includes a 123 billion parameter multimodal decoder and a 1 billion parameter image encoder with a 128K context window supporting up to 30 high-resolution images.

Key Specifications

Parameters

124.0B

Context

128.0K

Release Date

November 18, 2024

Average Score

80.5%

API Documentation Model Weights Results Blog

Timeline

Key dates in the model's history

Announcement

November 18, 2024

Last Update

July 19, 2025

Today

May 10, 2026

Technical Specifications

Parameters

124.0B

Training Tokens

Knowledge Cutoff

Family

Fine-tuned from

mistral-large-2-2407

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$2.00

Output (per 1M tokens)

$6.00

Max Input Tokens

128.0K

Max Output Tokens

128.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Multimodal

Working with images and visual data

AI2D

BBox Method BBox based on process training with for intermediate computations model. He includes three : 1. language which helps model its intermediate computation. For tasks example language can look how «14 × 15. First 10 × 15 = 150. Then 4 × 15 = 60. we receive 150 + 60 = 210, therefore 14 × 15 = 210». 2. With help RLHF (training with on basis connection from human) model work. For this demonstration or or more model, and use their for settings model with help training with 3. answer from model. answer, model also provides explanation about that, how model to answer • Self-reported

93.8%

ChartQA

Chain reasoning AI: I I will solve this task by in order to to correct answer. First I tasks and that me need to find. Then I task on more components. For each I corresponding mathematical or logical methods. I I will all intermediate computation and its work on each step. I results, in order to obtain final answer on task • Self-reported

88.1%

DocVQA

ANLS evaluation (ANLS) - this metric evaluation, for evaluation quality extraction information with images or text. ANLS measures between answer model and reference answer, considering possible in For each question and answer ANLS (NLS) between answer and from answers. NLS is determined how general between and reference answers to more from two If NLS below specific (usually 0.5), evaluation that helps which too from answers. metric ANLS is calculated how average value all NLS evaluations by all questions in set data. ANLS especially useful for tasks, where differences in for example, in tasks answers on questions by or in text • Self-reported

93.3%

MathVista

Chain thinking AI: you in complex computations I generate example question MMLU. Please, its, using approach chains thinking. First step for step, and then answer • Self-reported

69.4%

MMMU

CoT • Self-reported

64.0%

Other Tests

Specialized benchmarks

MM-MT-Bench

GPT-4o Judge We we present GPT-4o Judge for evaluation answers on mathematical tasks, which solves problems evaluations and methods. works following manner: 1. Question, answer and solution on GPT-4o with evaluate solution. 2. For with and errors we we use three key : a. First model explicitly verify each step solutions. b. possible errors, which human would find in solving. c. solution for comparison. 3. Model evaluation by scale from 0 to 5, where each score has : • 5: and solution • 4: In whole correct solution with minor • 3: Correct approach with errors • 2: to solving • 1: progress • 0: approach or solutions GPT-4o Judge demonstrates high with experts, achieving 83% evaluationin our tests. This makes its tool for evaluation mathematical solutions, especially for tasks level complexity • Self-reported

74.0%

VQAv2

VQA Match evaluation GPQA, VQA Match, is used for evaluation generated model answers by means of their comparison with reference answers. This process includes three key step: 1. : Answers steps in order to on results comparison, including answers to format. 2. in : answers then in with help model CLIP ViT-L/14, which uses training for creation text. 3. Comparison : score is calculated with help between answer model and answer. (to 1.0) indicates on more exact match between answer and reference. VQA Match evaluates and match, and not simply match, that ensures more evaluation quality answers model. analysis shows, that this method well with evaluationhuman on tasks GPQA • Self-reported

80.9%

License & Metadata

License

mistral_research_license_(mrl)_for_research;_mistral_commercial_license_for_commercial_use

Announcement Date

November 18, 2024

Last Updated

July 19, 2025

Similar Models

All Models

Mistral Large 3 (675B Instruct 2512)

Mistral AI

MM675.0B

Best score:0.4 (GPQA)

Released:Dec 2025

Price:$0.50/1M tokens

Mistral Small 3.2 24B Instruct

Mistral AI

MM23.6B

Best score:0.9 (HumanEval)

Released:Jun 2025

Mistral Small 3 24B Base

Mistral AI

MM23.6B

Best score:0.9 (ARC)

Released:Jan 2025

Pixtral-12B

Mistral AI

MM12.4B

Best score:0.7 (HumanEval)

Released:Sep 2024

Price:$0.15/1M tokens

Mistral Small 3.1 24B Instruct

Mistral AI

MM24.0B

Best score:0.9 (HumanEval)

Released:Mar 2025

Mistral Small 3.1 24B Base

Mistral AI

MM24.0B

Best score:0.8 (MMLU)

Released:Mar 2025

Price:$0.10/1M tokens

Magistral Medium

Mistral AI

MM24.0B

Best score:0.7 (GPQA)

Released:Jun 2025

Mistral Large 2

Mistral AI

123.0B

Best score:0.9 (HumanEval)

Released:Jul 2024

Price:$2.00/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.