Pixtral-12B
MultimodalMultimodal model with 12 billion parameters and a 400 million parameter visual encoder, capable of understanding both natural images and documents. It excels at multimodal tasks while maintaining high quality text-only performance. Supports images of various sizes and multiple images in context.
Key Specifications
Parameters
12.4B
Context
128.0K
Release Date
September 17, 2024
Average Score
66.8%
Timeline
Key dates in the model's history
Announcement
September 17, 2024
Last Update
July 19, 2025
Today
March 26, 2026
Technical Specifications
Parameters
12.4B
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$0.15
Output (per 1M tokens)
$0.15
Max Input Tokens
128.0K
Max Output Tokens
8.2K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
5-shot • Self-reported
Programming
Programming skills tests
HumanEval
Pass@1 Metric Pass@1 evaluates, how many problems from set model can solve with first attempts (with one ). This means, that for each tasks only first solution, model. Value Pass@1 shows, which percentage tasks model can solve directly, without capabilities corrections its answers or several attempts. This metric, since she/it not allows model on its or its answer. value Pass@1 indicates on ability model give correct answers immediately, that especially important in scenarios, where users and exact results without necessity queries • Self-reported
Mathematics
Mathematical problems and computations
MATH
Pass@1 In tasks evaluation model, especially in solutions tasks, Pass@1 is metric, percentage tasks, which model solves correctly with first attempts. This strict measure performance, which not allows model several attempts solutions or its answer. if model represents one solution for each from N tasks, and from them k solutions then Pass@1 = k/N. In context coding or mathematical tasks, where evaluation correctness (solution or correct, or no), Pass@1 ensures metric, not attempts or In difference from metrics, or Pass@1 measures base reliability model at execution tasks without capabilities verification or • Self-reported
Multimodal
Working with images and visual data
ChartQA
Chain thinking (Chain of Thought, CoT) AI: Let's process chains thinking. problem step for step. First I task, in order to understand, that from is required. Then I solve task, its thoughts on each For mathematical tasks I I will task on components, find corresponding and to solving. For reasoning I I will its logic, consider various and all aspects problems. Chain thinking helps me errors, thoughts and to correct answers. each step, I I can track its process and detect errors or incorrect This method especially useful for complex tasks, requiring multi-step reasoning • Self-reported
DocVQA
ANLS Average answers (ANLS) - this metric, used for evaluation quality answers on questions by VQA (Visual Question Answering) or DocVQA. She/It uses function computation NLS (), which better suits for evaluation answers on questions, than exact ANLS measures between answer and answer, some differences, which not on correctness answer (for example, "1990" and "1990 year" or "" and "With. "). This makes its more metric for evaluation systems, on user. Metric value from 0 to 1, where values to 1 indicate on more exact match between and answers • Self-reported
MathVista
Chain of Thought (CoT) AI: Method "chain reasoning" (Chain of Thought, CoT) - this technique, which offers models step by step solve tasks, explicitly showing intermediate steps reasoning. Instead that in order to immediately answer, model sequence logical steps, to This especially useful for tasks, which require several steps reasoning, such how mathematical tasks, logical puzzles and assignments, requiring analysis. Research show, that prompts "let's think step for step" or similar instructions can significantly improve performance model without which-or additional settings. CoT especially in complex tasks and can be other how Self-Consistency, when model generates several chains reasoning and most result • Self-reported
MMMU
Chain reasoning (Chain of Thought, CoT)
AI: Chain reasoning (Chain of Thought, CoT) • Self-reported
Other Tests
Specialized benchmarks
IFEval
Text Instruction Following Score For evaluation abilities model follow instructions we we measure, how well well model should specific instructions by its answer. These assignments represent itself output and general knowledge. For example, we model about "three " and we ask her/its use in answer. We also we ask model explain, that such answer only three Tasks are evaluated by two criteria: 1. Accuracy : factual information 2. : exact format This score based on approach, in MT-Bench • Self-reported
MM IF-Eval
Evaluation instructions This evaluation measures, how well well model understands and should complex instructions, which include how text, so and images. We we evaluate model on her/its abilities: - images - instructions, information - reasoning to on basis visual data Examples tasks: 1. "that on and then three possible application this " 2. "If on is human, its ; if this time " 3. "and all errors in mathematical on " Methodology evaluation: - Each task is evaluated by scale from 0 to 5 - accuracy execution instructions - application reasoning This metric especially important for models, which will in capacity for tasks where context can be critically important for correct answer • Self-reported
MM-MT-Bench
Multimodal MT-Bench Score
AI: Multimodal MT-Bench Score • Self-reported
MT-Bench
Text MT-Bench Score Evaluation MT-Bench for model provides measurement quality and abilities model at execution tasks language. Evaluation MT-Bench is score performance model by set assignments, for verification various aspects understanding and generation language. evaluation MT-Bench indicates on then, that model well handles with tasks, how reasoning, generalization and answers on questions. This means, that model demonstrates understanding language and can generate exact and answers. Evaluations MT-Bench can interpret following manner: • Evaluations above 8.0: performance, with models artificial intelligence • Evaluations 7.0-8.0: performance with understanding language • Evaluations 6.0-7.0: performance with some • Evaluations 5.0-6.0: performance with • Evaluations below 5.0: performance, which can not complex tasks Comparison evaluations MT-Bench different models can help choose most model for its specific especially when performance in specific language tasks has value • Self-reported
VQAv2
VQA Match : metric, measure quality work models artificial intelligence in tasks answer on questions (VQA). Method: In difference from tasks with set answers or tasks type "/no", metric VQA Match is applied to answers on questions about Metric provides value from 0 to 1, degree between answer model and reference answer. Process evaluation: 1. For given answer a and answer â sim(a, â) answers for obtaining evaluation 2. three evaluation: • Exact match: 1 score, if answers • : between answers • match: is used for answers on basis algorithms processing language Advantages: - with various answers, including numerical, and evaluates answers, by-for large data and various fields application Application: This metric is used for evaluation in tasks answer on questions, that allows conduct comparison various models machine training, with and text • Self-reported
License & Metadata
License
apache_2_0
Announcement Date
September 17, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsMistral Small 3.2 24B Instruct
Mistral AI
MM23.6B
Best score:0.9 (HumanEval)
Released:Jun 2025
Mistral Small 3 24B Base
Mistral AI
MM23.6B
Best score:0.9 (ARC)
Released:Jan 2025
Mistral Small 3.1 24B Instruct
Mistral AI
MM24.0B
Best score:0.9 (HumanEval)
Released:Mar 2025
Magistral Medium
Mistral AI
MM24.0B
Best score:0.7 (GPQA)
Released:Jun 2025
Mistral Small 3.1 24B Base
Mistral AI
MM24.0B
Best score:0.8 (MMLU)
Released:Mar 2025
Price:$0.10/1M tokens
Mistral Small 3 24B Instruct
Mistral AI
24.0B
Best score:0.8 (HumanEval)
Released:Jan 2025
Price:$0.10/1M tokens
Mistral NeMo Instruct
Mistral AI
12.0B
Best score:0.7 (MMLU)
Released:Jul 2024
Price:$0.15/1M tokens
Magistral Small 2506
Mistral AI
24.0B
Best score:0.7 (GPQA)
Released:Jun 2025
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.