Magistral Medium

Name: Magistral Medium
Author: Mistral AI

Multimodal

Mistral AI

Magistral Medium, trained exclusively using reinforcement learning on Mistral Medium 3, is a reasoning model that demonstrates high performance on complex math and coding problems without using distillation from existing reasoning models. Training uses the RLVR framework with GRPO modifications, providing improved reasoning abilities and multilingual consistency.

Key Specifications

Parameters

24.0B

Context

Release Date

June 10, 2025

Average Score

52.6%

API Documentation Research Paper Results Blog

Timeline

Key dates in the model's history

Announcement

June 10, 2025

Last Update

July 19, 2025

Today

May 10, 2026

Technical Specifications

Parameters

24.0B

Training Tokens

Knowledge Cutoff

June 1, 2025

Family

Capabilities

MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

Reasoning

Logical reasoning and analysis

GPQA

# Diamond Diamond (Diamond) - this method step-by-step reasoning through solutions tasks, which advantages two methods reasoning with : and think and which and errors in solving. ## Diamond with solutions given tasks (), then several independent solutions (most part ), and finally all solutions in and solution (). ## How apply Diamond 1. **solution:** solution tasks, using methods thinking, for example internal solution, if this possible. 2. **:** several independent solutions that indeed tasks with for example using: - methods solutions - strategies verification 3. **:** all solutions: - answers and conclusions about correctness - errors or solutions, most correct and approach - solution 4. **solution:** its solution tasks, on all approaches or on version best solutions. ## use Diamond especially useful for: - tasks, requiring accuracy - where can or easily details - which can solve different ## Strong side - reliability and by comparison with methods - Allows errors through several approaches - more high confidence in solving ## Example application For tasks computation : 1. **solution:** V = (4/3)πr³ 2. **:** Solve, using and etc.etc. 3. **:** • Self-reported

70.8%

Other Tests

Specialized benchmarks

Aider-Polyglot

accuracy • Self-reported

47.1%

AIME 2024

pass@1 tasks with first attempts (pass@1) measures proportion tasks, which model solves correctly with first attempts. This strict metric, since she/it evaluates probability obtaining correct answer only for one attempt. For computation pass@1: 1. Model generates one solution for each tasks 2. solution is evaluated how correct or 3. pass@1 = (number correctly solved tasks) / (general number tasks) This metric important, since she/it reflects ability model give correct answers without several attempts or iterations, that makes her/its especially for evaluation model in real scenarios application • Self-reported

73.6%

AIME 2025

with first attempts AI: Translate following text descriptions method analysis model AI on Russian language, rules: 1. on language. 2. all and in form (for example: GPT, LLM, API, AIME, GPQA). (for example: "thinking mode" → "mode thinking", "tools" → "tools"). 3. and 4. descriptions. 5. Not information, only then, that maintaining all details. 6. models (for example "GPT-5 nano", "Claude") on 7. benchmarks and on (for example: "AIME", "FrontierMath", "Harvard-MIT Mathematics Tournament"). 8. should be on text, 9. explanations, quotes or — on ONLY translation • Self-reported

64.9%

Humanity's Last Exam

text text when model answers on query text, which is exact this query. This problem, because that users in order to model them and reasoning, and not simply part their text them. For identification text text answer model with text query. If answer contains in accuracy those indeed that and from query, this how text. phrases or cases, when query its part • Self-reported

9.0%

LiveCodeBench

## Method analysis: This method analysis demonstrates, how LLM generates answers and helps evaluate, how well model determines Method based on for that, how model processes statements. ### method: 1. ****: and its **Example**: - ****: "in ". - ****: "in ". 2. **Query to model**: model in statements (that it ). 3. **Analysis abilities critically think**: can whether model their actual ### Value method: This method allows evaluate: - Ability model at work with How well well model can explain its reasoning - model between query and actual accuracy ### example: ``` Query: why in and not in Answer model: Although on in I I can why she/it could would in : 1. : that after in 1851 create which would 2. : In where and in XVI was in how this 3. with other : on between and in which was Important note, that these are was • Self-reported

50.3%

License & Metadata

License

apache_2_0

Announcement Date

June 10, 2025

Last Updated

July 19, 2025

Similar Models

All Models

Mistral Small 3.1 24B Instruct

Mistral AI

MM24.0B

Best score:0.9 (HumanEval)

Released:Mar 2025

Mistral Small 3.2 24B Instruct

Mistral AI

MM23.6B

Best score:0.9 (HumanEval)

Released:Jun 2025

Mistral Small 3 24B Base

Mistral AI

MM23.6B

Best score:0.9 (ARC)

Released:Jan 2025

Pixtral-12B

Mistral AI

MM12.4B

Best score:0.7 (HumanEval)

Released:Sep 2024

Price:$0.15/1M tokens

Mistral Small 3.1 24B Base

Mistral AI

MM24.0B

Best score:0.8 (MMLU)

Released:Mar 2025

Price:$0.10/1M tokens

GPT OSS 20B

OpenAI

MM20.0B

Best score:0.9 (MMLU)

Released:Aug 2025

Price:$0.10/1M tokens

Qwen2.5 VL 32B Instruct

Alibaba

MM33.5B

Best score:0.9 (HumanEval)

Released:Feb 2025

Mistral NeMo Instruct

Mistral AI

12.0B

Best score:0.7 (MMLU)

Released:Jul 2024

Price:$0.15/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.