Mistral AI logo

Mistral Small 3 24B Base

Multimodal
Mistral AI

Mistral Small 3 is competitive with larger models like Llama 3.3 70B or Qwen 32B and is an excellent open alternative to closed proprietary models like GPT4o-mini. Mistral Small 3 matches the quality of Llama 3.3 70B instruct while running more than 3x faster on the same hardware.

Key Specifications

Parameters
23.6B
Context
-
Release Date
January 30, 2025
Average Score
67.0%

Timeline

Key dates in the model's history
Announcement
January 30, 2025
Last Update
July 19, 2025
Today
March 26, 2026

Technical Specifications

Parameters
23.6B
Training Tokens
-
Knowledge Cutoff
October 1, 2023
Family
-
Capabilities
MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
MMLU
5-shotSelf-reported
80.7%

Programming

Programming skills tests
MBPP
Pass@1 Metric Pass@1 measures, how many times model solve task with first attempts. In order to verify, how well well model solves task without attempts, we we can measure proportion answers, which were correct with first times. For each tasks k: - We we ask model give one answer Ak. - We we evaluate, is whether Ak correct (1 = correctly, 0 = incorrectly). Pass@1 = (tasks, solved with first attempts) / (number tasks) This metric important, since she/it shows, how well users can answer model, not several attempts or verification. High score Pass@1 indicates on model, which solves tasks with first times, that critically important for many real applicationsSelf-reported
69.6%

Mathematics

Mathematical problems and computations
GSM8k
5-shot, maj@1 For each tasks we model 5 times and answer majority (or from most often answers in case ). can help model, which from-for process tokens. This queries requires 5 model on each taskSelf-reported
80.7%
MATH
5-shot, MaJSelf-reported
46.0%

Reasoning

Logical reasoning and analysis
GPQA
5-shot, CoT In this method model first is provided 5 examples solutions various tasks with using approach "chains reasoning" (Chain of Thought). This allows model steps reasoning before that, how she/it solve new task. This method combines training (few-shot learning) and chains reasoning, in order to ability model to solving complex tasks. several examples reasoning, model can identify templates solutions tasks and apply their to new 5-shot, CoT especially efficient for tasks, requiring multi-step reasoning, such how mathematical tasks, logical puzzles or tasks, requiring analysis. Examples in context how model, how structure its thoughts and break down complex tasks on managed stepsSelf-reported
34.4%

Other Tests

Specialized benchmarks
AGIEval
# represents itself process understanding text for extraction information, for execution tasks. For example, at solving mathematical tasks first need to tasks, then identify and goal — then, that we find. ## Example step-by-step process: 1. **all text**: - with general understanding text. 2. **components**: - **tasks**: that specifically ****: all key and their values. - ****: that specifically is required find. 3. **information**: - data in format. - between various 4. ****: - that all important components. - that not information. especially useful for tasks, requiring extraction information, such how mathematical tasks, scientific or helps by means of text on managedSelf-reported
65.8%
ARC-C
0-shot AI: at which model is provided task without any-or examples or additional context. These tasks basic knowledge model and understanding instructions. For tasks, requiring specialized knowledge, more or model can in 0-shot mode, in then time how new model, trained on more diverse and specialized data, often can even without additional prompts. Example: "numbers from 1 to 100" without any-or additional instructionsSelf-reported
91.3%
MMLU-Pro
0-shot CoT In this approach, on method "chain thinking" (chain-of-thought), we directly we ask model solve task, her/its on stages thinking, without provision examples such process. Usually are used prompts "Let's solve this task step for step" or "Let's think step by step", which model generate intermediate reasoning before final answer. This method efficient for models, sufficiently in order to independently chains reasoning. He allows model structure thinking without necessity in training examples, that makes approach more and less from specific examplesSelf-reported
54.4%
TriviaQA
5-shot We we use several examples for more demonstrations. use "k" examples usually is called k-shot In given case we 5-shot providing model 5 examples demonstrations, before than she/it generates output. that number examples often improves performance for provision model information about task and format answer. However, by number examples, and very number examples can even lead to to performance from-for limitations or modelSelf-reported
80.3%

License & Metadata

License
apache_2_0
Announcement Date
January 30, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.