Mistral Small 3 24B Base

Name: Mistral Small 3 24B Base
Author: Mistral AI

Multimodal

Mistral AI

Mistral Small 3 is competitive with larger models like Llama 3.3 70B or Qwen 32B and is an excellent open alternative to closed proprietary models like GPT4o-mini. Mistral Small 3 matches the quality of Llama 3.3 70B instruct while running more than 3x faster on the same hardware.

Key Specifications

Parameters

23.6B

Context

Release Date

January 30, 2025

Average Score

67.0%

Model Weights Results Blog

Timeline

Key dates in the model's history

Announcement

January 30, 2025

Last Update

July 19, 2025

Today

May 10, 2026

Technical Specifications

Parameters

23.6B

Training Tokens

Knowledge Cutoff

October 1, 2023

Family

Capabilities

MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

MMLU

5-shot • Self-reported

80.7%

Programming

Programming skills tests

MBPP

Pass@1 Metric Pass@1 measures, how many times model solve task with first attempts. In order to verify, how well well model solves task without attempts, we we can measure proportion answers, which were correct with first times. For each tasks k: - We we ask model give one answer Ak. - We we evaluate, is whether Ak correct (1 = correctly, 0 = incorrectly). Pass@1 = (tasks, solved with first attempts) / (number tasks) This metric important, since she/it shows, how well users can answer model, not several attempts or verification. High score Pass@1 indicates on model, which solves tasks with first times, that critically important for many real applications • Self-reported

69.6%

Mathematics

Mathematical problems and computations

GSM8k

5-shot, maj@1 For each tasks we model 5 times and answer majority (or from most often answers in case ). can help model, which from-for process tokens. This queries requires 5 model on each task • Self-reported

80.7%

MATH

5-shot, MaJ • Self-reported

46.0%

Reasoning

Logical reasoning and analysis

GPQA

5-shot, CoT In this method model first is provided 5 examples solutions various tasks with using approach "chains reasoning" (Chain of Thought). This allows model steps reasoning before that, how she/it solve new task. This method combines training (few-shot learning) and chains reasoning, in order to ability model to solving complex tasks. several examples reasoning, model can identify templates solutions tasks and apply their to new 5-shot, CoT especially efficient for tasks, requiring multi-step reasoning, such how mathematical tasks, logical puzzles or tasks, requiring analysis. Examples in context how model, how structure its thoughts and break down complex tasks on managed steps • Self-reported

34.4%

Other Tests

Specialized benchmarks

AGIEval

# represents itself process understanding text for extraction information, for execution tasks. For example, at solving mathematical tasks first need to tasks, then identify and goal — then, that we find. ## Example step-by-step process: 1. **all text**: - with general understanding text. 2. **components**: - **tasks**: that specifically ****: all key and their values. - ****: that specifically is required find. 3. **information**: - data in format. - between various 4. ****: - that all important components. - that not information. especially useful for tasks, requiring extraction information, such how mathematical tasks, scientific or helps by means of text on managed • Self-reported

65.8%

ARC-C

0-shot AI: at which model is provided task without any-or examples or additional context. These tasks basic knowledge model and understanding instructions. For tasks, requiring specialized knowledge, more or model can in 0-shot mode, in then time how new model, trained on more diverse and specialized data, often can even without additional prompts. Example: "numbers from 1 to 100" without any-or additional instructions • Self-reported

91.3%

MMLU-Pro

0-shot CoT In this approach, on method "chain thinking" (chain-of-thought), we directly we ask model solve task, her/its on stages thinking, without provision examples such process. Usually are used prompts "Let's solve this task step for step" or "Let's think step by step", which model generate intermediate reasoning before final answer. This method efficient for models, sufficiently in order to independently chains reasoning. He allows model structure thinking without necessity in training examples, that makes approach more and less from specific examples • Self-reported

54.4%

TriviaQA

5-shot We we use several examples for more demonstrations. use "k" examples usually is called k-shot In given case we 5-shot providing model 5 examples demonstrations, before than she/it generates output. that number examples often improves performance for provision model information about task and format answer. However, by number examples, and very number examples can even lead to to performance from-for limitations or model • Self-reported

80.3%

License & Metadata

License

apache_2_0

Announcement Date

January 30, 2025

Last Updated

July 19, 2025

Similar Models

All Models

Mistral Small 3.2 24B Instruct

Mistral AI

MM23.6B

Best score:0.9 (HumanEval)

Released:Jun 2025

Pixtral-12B

Mistral AI

MM12.4B

Best score:0.7 (HumanEval)

Released:Sep 2024

Price:$0.15/1M tokens

Mistral Small 3.1 24B Instruct

Mistral AI

MM24.0B

Best score:0.9 (HumanEval)

Released:Mar 2025

Magistral Medium

Mistral AI

MM24.0B

Best score:0.7 (GPQA)

Released:Jun 2025

Mistral Small 3.1 24B Base

Mistral AI

MM24.0B

Best score:0.8 (MMLU)

Released:Mar 2025

Price:$0.10/1M tokens

Mistral Small 3 24B Instruct

Mistral AI

24.0B

Best score:0.8 (HumanEval)

Released:Jan 2025

Price:$0.10/1M tokens

Mistral NeMo Instruct

Mistral AI

12.0B

Best score:0.7 (MMLU)

Released:Jul 2024

Price:$0.15/1M tokens

Magistral Small 2506

Mistral AI

24.0B

Best score:0.7 (GPQA)

Released:Jun 2025

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.