Mistral AI logo

Mistral Small 3.1 24B Base

Multimodal
Mistral AI

The pre-trained base version of the Mistral Small 3.1 model. Features improved text capabilities, multimodal understanding, multilingual abilities, and an expanded 128k token context window compared to Mistral Small 3. Intended for fine-tuning.

Key Specifications

Parameters
24.0B
Context
128.0K
Release Date
March 17, 2025
Average Score
62.9%

Timeline

Key dates in the model's history
Announcement
March 17, 2025
Last Update
July 19, 2025
Today
March 26, 2026

Technical Specifications

Parameters
24.0B
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$0.10
Output (per 1M tokens)
$0.30
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
MMLU
# A/B-testing for improvements LLM with help thinking ## When large language model (LLM) work with tasks, reasoning, that model show results, when their "think step for step". This to methods prompts, model to more solving tasks. However, how and other prompts, this approach has two key limitations: 1. He is applied to all tasks, although model can more high accuracy at different to tasks. 2. He not always with points view efficiency, since intermediate computation text. In given work we we offer approach, which allows model between various "thinking" in dependency from tasks. We that between: - answer - results ...substantially improves general performance, especially in tasks, requiring mathematical or complex logical conclusions. ## Methodology ### thinking We we determine and we compare main mode thinking, which can models: 1. **mode**: Model answers directly, without intermediate steps reasoning. 2. **mode**: Model complex task on sequence intermediate steps before provision final answer. 3. **Mode **: Model offers answer, then verifies its correctness, errors. 4. **mode**: Model first determines mode for given tasks, and then applies itsSelf-reported
81.0%

Reasoning

Logical reasoning and analysis
GPQA
accuracySelf-reported
37.5%

Multimodal

Working with images and visual data
MMMU
Accuracy CoT We analysis efficiency model at use method Chain-of-Thought (CoT, chain reasoning) for solutions tasks, requiring reasoning. CoT offers model not simply generate final answer, but perform step-by-step reasoning before answer. Evaluation following manner: 1. model questions, requiring reasoning, and we ask show intermediate steps 2. not only final answer, but and and correctness reasoning 3. accuracy answers, with using CoT, and answers, direct analysis is measurement: - model break down complex tasks on more simple steps - in reasoning model - improvements accuracy reasoning - errors, which can in process reasoningSelf-reported
59.3%

Other Tests

Specialized benchmarks
MMLU-Pro
0-shot CoT Method thinking by chain without examples (0-shot Chain-of-Thought, 0-shot CoT) encourages model solve tasks step by step, without provision specific examples such process reasoning. This approach uses general prompts, such how "Let's let's solve this task step for step" or "Let's ", in order to stimulate model to process reasoning before final answer. Method 0-shot CoT especially useful, when no capabilities or provide samples reasoning for specific tasks. He usually outperforms simple prompts without reasoning, so how allows model break down complex task on more managed subtasks, that probability errors at solving multi-step tasks. not less, 0-shot CoT usually other CoT, in which are used examples, since model not receives specific about that, how manner structure its thinking for specific tasks. for examples quality reasoning depends from abilities model independently way analysis tasksSelf-reported
56.0%
TriviaQA
5-shotSelf-reported
80.5%

License & Metadata

License
apache_2_0
Announcement Date
March 17, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.