Magistral Small 2506

Name: Magistral Small 2506
Author: Mistral AI

Mistral AI

Based on Mistral Small 3.1 (2503) with added reasoning capabilities, having undergone SFT on Magistral Medium traces and additional reinforcement learning, this is a small efficient reasoning model with 24 billion parameters. Magistral Small can be deployed locally, fitting on a single RTX 4090 or a MacBook with 32GB RAM after quantization.

Key Specifications

Parameters

24.0B

Context

Release Date

June 10, 2025

Average Score

63.2%

API Documentation Research Paper Model Weights Results Blog

Timeline

Key dates in the model's history

Announcement

June 10, 2025

Last Update

July 19, 2025

Today

May 9, 2026

Technical Specifications

Parameters

24.0B

Training Tokens

Knowledge Cutoff

June 1, 2025

Family

Capabilities

MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

Reasoning

Logical reasoning and analysis

GPQA

Diamond AI: I I will answers, using method Diamond - structured approach to evaluation various aspects answer. What such method Diamond? Diamond - this system analysis answers, which evaluates 5 key : D - : How well and from errors information? I - : How well exactly instructions? A - : How well and answers on question? M - : How well process reasoning? O - : How well well and presented answer? N - : How well important nuances and details? D - : side answer can note? method Diamond, I I will: 1. each measurement 2. from 1 to 5 for evaluation each measurement 3. specific examples for evaluations 4. specific by 5. about strong and answer This methodology allows and evaluate answers, how strong side, so and field for • Self-reported

68.2%

Other Tests

Specialized benchmarks

AIME 2024

Score Method evaluation for — this proportion tasks, which successfully this method evaluation consists in that, in order to evaluate efficiency systems in solving complex mathematical tasks. In our case "solution" tasks means, that model can find correct or solution, solutions. This template solutions often includes justification and specific answer. For each tasks we we determine evaluation how 1, if answer model correct, and 0, if answer incorrect. For application method evaluation: 1. numerical or answers, obtained model, with correct answers. 2. each task how 1 () or 0 (not ). 3. evaluation by all tasks, that gives proportion successfully solved tasks. This and method evaluation, which allows us evaluate, how well well model handles with tasks • Self-reported

70.7%

AIME 2025

Evaluation AI: ChatGPT-4o (gpt-4o-2024-05-13) Evaluation models on mathematical sometimes includes in itself between errors from-for reasoning or other process solutions. Analysis often by means of models, step for step, and identification errors. Such analysis can give representation about that, which model most for errors and in We model about their errors and about that, by which incorrect answer. This can make how for own solutions model, so and for solutions, other models • Self-reported

62.8%

LiveCodeBench

# analysis reasoning: version (v5) ## analysis reasoning (Automatic Logic-Symbolic Reasoning Analysis, ALSRA) — this tool for analysis chains reasoning in models (LLM). ALSRA provides and way evaluation structure, methodology and solutions tasks, especially mathematical and logical. ## Methodology ALSRA determines and various reasoning in text solutions: ### reasoning 1. **/equations**: expressions in form 2. ****: or 3. ****: about mathematical or 4. ****: following from previous steps 5. ****: results or 6. **steps**: on approach or method solutions 7. ****: in process solutions ### value in with its in text. should be explicitly and for solutions. ### mathematical characters ALSRA general number mathematical characters in solving, including: - and (x, y, n, 5, π) - (+, -, ×, ÷, ∫, ∑, ∏) - (=, <, >, ≤, ≥, ∈, ⊂) - and (∞, ∅, ∀, ∃, ⇒, ⇔) ### (SDF) SDF is calculated how number mathematical characters to number tokens in text. This score reflects degree use in solving. ## Analysis and evaluation ALSRA provides: 1. scores by each type reasoning 2. mathematical characters 3. (SDF) 4 • Self-reported

51.3%

License & Metadata

License

apache_2_0

Announcement Date

June 10, 2025

Last Updated

July 19, 2025

Similar Models

All Models

Mistral Small 3 24B Instruct

Mistral AI

24.0B

Best score:0.8 (HumanEval)

Released:Jan 2025

Price:$0.10/1M tokens

Mistral NeMo Instruct

Mistral AI

12.0B

Best score:0.7 (MMLU)

Released:Jul 2024

Price:$0.15/1M tokens

Devstral Small 1.1

Mistral AI

24.0B

Released:Jul 2025

Price:$0.10/1M tokens

Mistral Small

Mistral AI

22.0B

Released:Sep 2024

Price:$0.20/1M tokens

Codestral-22B

Mistral AI

22.2B

Best score:0.8 (HumanEval)

Released:May 2024

Price:$0.20/1M tokens

Mistral Small 3.1 24B Instruct

Mistral AI

MM24.0B

Best score:0.9 (HumanEval)

Released:Mar 2025

Mistral Small 3.2 24B Instruct

Mistral AI

MM23.6B

Best score:0.9 (HumanEval)

Released:Jun 2025

Magistral Medium

Mistral AI

MM24.0B

Best score:0.7 (GPQA)

Released:Jun 2025

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.