Mistral Small 3.1 24B Base

Name: Mistral Small 3.1 24B Base
Author: Mistral AI

Multimodal

Mistral AI

The pre-trained base version of the Mistral Small 3.1 model. Features improved text capabilities, multimodal understanding, multilingual abilities, and an expanded 128k token context window compared to Mistral Small 3. Intended for fine-tuning.

Key Specifications

Parameters

24.0B

Context

128.0K

Release Date

March 17, 2025

Average Score

62.9%

Model Weights Results Blog

Timeline

Key dates in the model's history

Announcement

March 17, 2025

Last Update

July 19, 2025

Today

July 7, 2026

Technical Specifications

Parameters

24.0B

Training Tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$0.10

Output (per 1M tokens)

$0.30

Max Input Tokens

128.0K

Max Output Tokens

128.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

MMLU

# A/B-testing for improvements LLM with help thinking ## When large language model (LLM) work with tasks, reasoning, that model show results, when their "think step for step". This to methods prompts, model to more solving tasks. However, how and other prompts, this approach has two key limitations: 1. He is applied to all tasks, although model can more high accuracy at different to tasks. 2. He not always with points view efficiency, since intermediate computation text. In given work we we offer approach, which allows model between various "thinking" in dependency from tasks. We that between: - answer - results ...substantially improves general performance, especially in tasks, requiring mathematical or complex logical conclusions. ## Methodology ### thinking We we determine and we compare main mode thinking, which can models: 1. **mode**: Model answers directly, without intermediate steps reasoning. 2. **mode**: Model complex task on sequence intermediate steps before provision final answer. 3. **Mode **: Model offers answer, then verifies its correctness, errors. 4. **mode**: Model first determines mode for given tasks, and then applies its • Self-reported

81.0%

Reasoning

Logical reasoning and analysis

GPQA

accuracy • Self-reported

37.5%

Multimodal

Working with images and visual data

MMMU

Accuracy CoT We analysis efficiency model at use method Chain-of-Thought (CoT, chain reasoning) for solutions tasks, requiring reasoning. CoT offers model not simply generate final answer, but perform step-by-step reasoning before answer. Evaluation following manner: 1. model questions, requiring reasoning, and we ask show intermediate steps 2. not only final answer, but and and correctness reasoning 3. accuracy answers, with using CoT, and answers, direct analysis is measurement: - model break down complex tasks on more simple steps - in reasoning model - improvements accuracy reasoning - errors, which can in process reasoning • Self-reported

59.3%

Other Tests

Specialized benchmarks

MMLU-Pro

0-shot CoT Method thinking by chain without examples (0-shot Chain-of-Thought, 0-shot CoT) encourages model solve tasks step by step, without provision specific examples such process reasoning. This approach uses general prompts, such how "Let's let's solve this task step for step" or "Let's ", in order to stimulate model to process reasoning before final answer. Method 0-shot CoT especially useful, when no capabilities or provide samples reasoning for specific tasks. He usually outperforms simple prompts without reasoning, so how allows model break down complex task on more managed subtasks, that probability errors at solving multi-step tasks. not less, 0-shot CoT usually other CoT, in which are used examples, since model not receives specific about that, how manner structure its thinking for specific tasks. for examples quality reasoning depends from abilities model independently way analysis tasks • Self-reported

56.0%

TriviaQA

5-shot • Self-reported

80.5%

License & Metadata

License

apache_2_0

Announcement Date

March 17, 2025

Last Updated

July 19, 2025

Similar Models

All Models

Mistral Small 3.1 24B Instruct

Mistral AI

MM24.0B

Best score:0.9 (HumanEval)

Released:Mar 2025

Mistral Small 3.2 24B Instruct

Mistral AI

MM23.6B

Best score:0.9 (HumanEval)

Released:Jun 2025

Mistral Small 3 24B Base

Mistral AI

MM23.6B

Best score:0.9 (ARC)

Released:Jan 2025

Pixtral-12B

Mistral AI

MM12.4B

Best score:0.7 (HumanEval)

Released:Sep 2024

Price:$0.15/1M tokens

Magistral Medium

Mistral AI

MM24.0B

Best score:0.7 (GPQA)

Released:Jun 2025

Mistral Small 3 24B Instruct

Mistral AI

24.0B

Best score:0.8 (HumanEval)

Released:Jan 2025

Price:$0.10/1M tokens

Gemma 3 27B

Google

MM27.0B

Best score:0.9 (HumanEval)

Released:Mar 2025

Price:$0.11/1M tokens

Gemma 3 12B

Google

MM12.0B

Best score:0.9 (HumanEval)

Released:Mar 2025

Price:$0.05/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.