Key Specifications
Parameters
12.0B
Context
128.0K
Release Date
July 18, 2024
Average Score
64.3%
Timeline
Key dates in the model's history
Announcement
July 18, 2024
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
12.0B
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$0.15
Output (per 1M tokens)
$0.15
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
HellaSwag
0-shot evaluation AI: whether at model which-or such how with not people, answer on questions, and etc.etc.? Human: I answer on question about X = (X₁, X₂, ..., Xₙ) - where Xᵢ - with E[Xᵢ] = μᵢ and Var(Xᵢ) = σᵢ². Cov(X, X). AI: [Model answers] evaluation: 1) whether model on question or from-for problems with ? 2) whether (with σᵢ² on )? 3) whether other mathematical errors in ? • Self-reported
MMLU
5-shot evaluation In order to better understand, how well well model we with various examples. total we we use 5-shot evaluation, where model is provided 5 examples solutions similar tasks before that, how her/its ask solve task. We we choose 5 examples with correct solutions, complexity target tasks and mathematical For example, for evaluation abilities model solve equations, we we provide 5 examples solutions various This method allows us evaluate ability model to few-shot training in context, that is important her/its general mathematical abilities. When analysis results we how accuracy answer, so and correctness process solutions. Such approach allows more evaluate capabilities model, than zero-shot evaluation, and better matches that, how model in real scenarios • Self-reported
TruthfulQA
0-shot evaluation In 0-shot evaluation immediately give model task without additional examples, prompts or explanations, in order to verify basic abilities model in solving tasks. This method evaluation offers measurement abilities model in mode by how well well model understands and performs task without additional information. For 0-shot evaluation in capacity performance model. She/It ensures understanding that, that model capable do "from ", without additional help, instructions or prompts, which could would on result. Such approach to evaluation especially for determination abilities model to generalization and knowledge in new situations • Self-reported
Winogrande
0-shot evaluation AI: 0-shot relates to to format testing, when model not receives no/none special instructions or examples for solutions specific tasks. Instead this she/it solves task, exclusively on knowledge, obtained in time preliminary training. Such approach usually is used for evaluation basic capabilities model. Human: For each tasks we directly we present assignment model without additional instructions, examples or prompts. Such approach most exactly evaluates basic capabilities model, and not ability follow • Self-reported
Other Tests
Specialized benchmarks
CommonSenseQA
Zero-shot (0-shot) evaluation Zero-shot (0-shot) evaluation relates to to in which model on task without preliminary provision examples or instructions about that, how her/its solve. This with few-shot evaluation, where model several examples, format or reasoning. In zero-shot scenarios model should rely only on its preliminarily trained knowledge and abilities, in order to how to task. This is considered more complex, but and more capabilities model, so how in real scenarios use users often not provide examples before query. Zero-shot evaluation are used in benchmarks and research, in order to evaluate basic abilities models. However they can capabilities model, if task manner or if model not fully understands, that is required without additional context • Self-reported
Natural Questions
5-shot evaluation AI: on following questions. Examples: 1. Question: What such ? Answer: this which can or other from such how or 2. Question: What such ? Answer: this and 3. Question: What such ? Answer: this at in 4. Question: What such ? Answer: this which not in process and not 5. Question: What such ? Answer: this with but various Human: [question] • Self-reported
OpenBookQA
Evaluation method "examples" In this method evaluation model ask execute task without provision any-or examples. Model should understand instructions and execute task, relying on exclusively on its capabilities and knowledge, obtained in time training. This especially useful for evaluation abilities model understand and perform new tasks, with which she/it not and also for measurement abilities model to generalization. Evaluation method "examples" also can identify in knowledge model or limitations in her/its abilities interpret instructions. In difference from methods evaluation with several examples (few-shot), where model are provided samples for understanding format or result, evaluation method "examples" is more abilities model to training and • Self-reported
TriviaQA
5-shot evaluation
AI:
5-shot evaluation • Self-reported
License & Metadata
License
apache_2_0
Announcement Date
July 18, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsMistral Small 3 24B Instruct
Mistral AI
24.0B
Best score:0.8 (HumanEval)
Released:Jan 2025
Price:$0.10/1M tokens
Magistral Small 2506
Mistral AI
24.0B
Best score:0.7 (GPQA)
Released:Jun 2025
Devstral Small 1.1
Mistral AI
24.0B
Released:Jul 2025
Price:$0.10/1M tokens
Mistral Small
Mistral AI
22.0B
Released:Sep 2024
Price:$0.20/1M tokens
Codestral-22B
Mistral AI
22.2B
Best score:0.8 (HumanEval)
Released:May 2024
Price:$0.20/1M tokens
Mistral Small 3.2 24B Instruct
Mistral AI
MM23.6B
Best score:0.9 (HumanEval)
Released:Jun 2025
Mistral Small 3 24B Base
Mistral AI
MM23.6B
Best score:0.9 (ARC)
Released:Jan 2025
Pixtral-12B
Mistral AI
MM12.4B
Best score:0.7 (HumanEval)
Released:Sep 2024
Price:$0.15/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.