Mistral Small 3 24B Instruct

Name: Mistral Small 3 24B Instruct
Author: Mistral AI

Mistral AI

Mistral Small 3 is a 24 billion parameter LLM distributed under the Apache-2.0 license. The model focuses on instruction following with low latency and high efficiency, maintaining performance comparable to larger models. It delivers fast and accurate responses for conversational agents, function calling, and domain-specific fine-tuning. Suitable for local inference when quantized, it competes with models 2-3x its size while using significantly fewer computational resources.

Key Specifications

Parameters

24.0B

Context

32.0K

Release Date

January 30, 2025

Average Score

71.7%

API Documentation Model Weights Results Blog

Timeline

Key dates in the model's history

Announcement

January 30, 2025

Last Update

July 19, 2025

Today

May 10, 2026

Technical Specifications

Parameters

24.0B

Training Tokens

Knowledge Cutoff

October 1, 2023

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$0.10

Output (per 1M tokens)

$0.30

Max Input Tokens

32.0K

Max Output Tokens

32.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests

HumanEval

5 shot COT AI: (this text 5 times for each question) 1. question: [question] 2. Let us its on parts and let's solve step by step. 3. [reasoning by chain with steps] 4. after analysis all steps, I I can final answer. 5. Answer: [answer] I I can that question [/requires on subtasks/requires knowledge]. use method reasoning by chain: 1. question 2. Then key components 3. each part by 4. solutions for obtaining final answer For each query: - full solution with steps - answer - In complex cases result method This approach especially efficient for mathematical tasks, logical puzzles and tasks, requiring analysis • Self-reported

84.8%

Mathematics

Mathematical problems and computations

MATH

AI: I assignments, model apply various types reasoning, perform different tasks and specific I her/its be in order to her/its knowledge in specific field, or use specific mode thinking (for example, break down task on parts). goal — identify, how model on different instructions. Strong side: way various abilities model, allows me behavior at execution set tasks, evaluate model and her/its ability interpret and gives understanding that, how model on different queries. side: understand, indeed whether model performs assignment manner or simply makes type; model can have its abilities, especially when her/its ask be ; also, if model not handles with task, difficult understand, whether this with specific instructions or with model execute • Self-reported

70.6%

Reasoning

Logical reasoning and analysis

GPQA

5 shot COT AI: new solutions tasks "5 shot COT", for that, in order to ability language models use examples and perform code. : 1. Models are provided 5 fully examples reasoning by chain (chain-of-thought, COT) for similar tasks. All examples have one and that indeed format and method solutions, that training. 2. Then model question, such indeed reasoning. 3. Answers are evaluated by two : a. whether model that indeed format and reasoning, that in examples? b. Correct whether final answer? especially useful for tasks, which require methodology and — for example, execution algorithms, and etc.etc. Advantages: - ability model methods solutions tasks - ability (or ) model to processes - in on training through examples Example use this method: provision model five examples search in on different and then apply that indeed method to • Self-reported

45.3%

Other Tests

Specialized benchmarks

Arena Hard

# Evaluation evaluation, criteria Correctness and Efficiency. ## Correctness (0-3) 0. or solution 1. In whole correct but with errors 2. correct, with minor or 3. solution ## Efficiency (0-3) 0. or too 1. but approach not 2. solution 3. solution Evaluation by each and then is calculated general score. evaluation — 6 points • Self-reported

87.6%

IFEval

Score • Self-reported

82.9%

MMLU-Pro

5 chains reasoning • Self-reported

66.3%

MT-Bench

Score AI: Evaluation • Self-reported

83.5%

Wild Bench

In this we we present capabilities Claude 3 Opus for solutions complex tasks reasoning. We we compare performance Claude 3 Opus with its Claude 2, and also with various models GPT-4 from OpenAI. For evaluation we used three benchmark with tasks level, on which reasoning and solutions tasks in model: - GPQA (test on reproduction knowledge) - set from 448 complex tasks, questions from mathematics and level and MATH - set from 5,000 tasks by mathematics level with FrontierMath - set from 136 tasks level complexity, from including mathematical level IMO We discovered, that Claude 3 Opus significantly outperforms its Claude 2 on all three benchmarks. However on tests GPQA and MATH Claude 3 Opus version GPT-4 with 32k context approximately on 3-5 points. On FrontierMath (most complex from tests) Claude 3 Opus outperforms GPT-4 with 8k context, but from GPT-4 with 32k context. that we also discovered, that technique prompting, on approach "mode thinking", which was for models GPT, also significantly improves performance Claude 3 Opus. This performance can be especially useful for users Claude 3 Opus, which necessary solve complex mathematical tasks • Self-reported

52.2%

License & Metadata

License

apache_2_0

Announcement Date

January 30, 2025

Last Updated

July 19, 2025

Similar Models

All Models

Mistral NeMo Instruct

Mistral AI

12.0B

Best score:0.7 (MMLU)

Released:Jul 2024

Price:$0.15/1M tokens

Magistral Small 2506

Mistral AI

24.0B

Best score:0.7 (GPQA)

Released:Jun 2025

Devstral Small 1.1

Mistral AI

24.0B

Released:Jul 2025

Price:$0.10/1M tokens

Mistral Small

Mistral AI

22.0B

Released:Sep 2024

Price:$0.20/1M tokens

Codestral-22B

Mistral AI

22.2B

Best score:0.8 (HumanEval)

Released:May 2024

Price:$0.20/1M tokens

Mistral Small 3 24B Base

Mistral AI

MM23.6B

Best score:0.9 (ARC)

Released:Jan 2025

Pixtral-12B

Mistral AI

MM12.4B

Best score:0.7 (HumanEval)

Released:Sep 2024

Price:$0.15/1M tokens

Mistral Small 3.1 24B Instruct

Mistral AI

MM24.0B

Best score:0.9 (HumanEval)

Released:Mar 2025

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.