Mistral AI logo

Mistral Small 3 24B Instruct

Mistral AI

Mistral Small 3 is a 24 billion parameter LLM distributed under the Apache-2.0 license. The model focuses on instruction following with low latency and high efficiency, maintaining performance comparable to larger models. It delivers fast and accurate responses for conversational agents, function calling, and domain-specific fine-tuning. Suitable for local inference when quantized, it competes with models 2-3x its size while using significantly fewer computational resources.

Key Specifications

Parameters
24.0B
Context
32.0K
Release Date
January 30, 2025
Average Score
71.7%

Timeline

Key dates in the model's history
Announcement
January 30, 2025
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
24.0B
Training Tokens
-
Knowledge Cutoff
October 1, 2023
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$0.10
Output (per 1M tokens)
$0.30
Max Input Tokens
32.0K
Max Output Tokens
32.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests
HumanEval
5 shot COT AI: (this text 5 times for each question) 1. question: [question] 2. Let us its on parts and let's solve step by step. 3. [reasoning by chain with steps] 4. after analysis all steps, I I can final answer. 5. Answer: [answer] I I can that question [/requires on subtasks/requires knowledge]. use method reasoning by chain: 1. question 2. Then key components 3. each part by 4. solutions for obtaining final answer For each query: - full solution with steps - answer - In complex cases result method This approach especially efficient for mathematical tasks, logical puzzles and tasks, requiring analysisSelf-reported
84.8%

Mathematics

Mathematical problems and computations
MATH
AI: I assignments, model apply various types reasoning, perform different tasks and specific I her/its be in order to her/its knowledge in specific field, or use specific mode thinking (for example, break down task on parts). goal — identify, how model on different instructions. Strong side: way various abilities model, allows me behavior at execution set tasks, evaluate model and her/its ability interpret and gives understanding that, how model on different queries. side: understand, indeed whether model performs assignment manner or simply makes type; model can have its abilities, especially when her/its ask be ; also, if model not handles with task, difficult understand, whether this with specific instructions or with model executeSelf-reported
70.6%

Reasoning

Logical reasoning and analysis
GPQA
5 shot COT AI: new solutions tasks "5 shot COT", for that, in order to ability language models use examples and perform code. : 1. Models are provided 5 fully examples reasoning by chain (chain-of-thought, COT) for similar tasks. All examples have one and that indeed format and method solutions, that training. 2. Then model question, such indeed reasoning. 3. Answers are evaluated by two : a. whether model that indeed format and reasoning, that in examples? b. Correct whether final answer? especially useful for tasks, which require methodology and — for example, execution algorithms, and etc.etc. Advantages: - ability model methods solutions tasks - ability (or ) model to processes - in on training through examples Example use this method: provision model five examples search in on different and then apply that indeed method toSelf-reported
45.3%

Other Tests

Specialized benchmarks
Arena Hard
# Evaluation evaluation, criteria Correctness and Efficiency. ## Correctness (0-3) 0. or solution 1. In whole correct but with errors 2. correct, with minor or 3. solution ## Efficiency (0-3) 0. or too 1. but approach not 2. solution 3. solution Evaluation by each and then is calculated general score. evaluation — 6 pointsSelf-reported
87.6%
IFEval
ScoreSelf-reported
82.9%
MMLU-Pro
5 chains reasoningSelf-reported
66.3%
MT-Bench
Score AI: EvaluationSelf-reported
83.5%
Wild Bench
In this we we present capabilities Claude 3 Opus for solutions complex tasks reasoning. We we compare performance Claude 3 Opus with its Claude 2, and also with various models GPT-4 from OpenAI. For evaluation we used three benchmark with tasks level, on which reasoning and solutions tasks in model: - GPQA (test on reproduction knowledge) - set from 448 complex tasks, questions from mathematics and level and MATH - set from 5,000 tasks by mathematics level with FrontierMath - set from 136 tasks level complexity, from including mathematical level IMO We discovered, that Claude 3 Opus significantly outperforms its Claude 2 on all three benchmarks. However on tests GPQA and MATH Claude 3 Opus version GPT-4 with 32k context approximately on 3-5 points. On FrontierMath (most complex from tests) Claude 3 Opus outperforms GPT-4 with 8k context, but from GPT-4 with 32k context. that we also discovered, that technique prompting, on approach "mode thinking", which was for models GPT, also significantly improves performance Claude 3 Opus. This performance can be especially useful for users Claude 3 Opus, which necessary solve complex mathematical tasksSelf-reported
52.2%

License & Metadata

License
apache_2_0
Announcement Date
January 30, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.