Key Specifications
Parameters
24.0B
Context
128.0K
Release Date
March 17, 2025
Average Score
62.9%
Timeline
Key dates in the model's history
Announcement
March 17, 2025
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
24.0B
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$0.10
Output (per 1M tokens)
$0.30
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
# A/B-testing for improvements LLM with help thinking ## When large language model (LLM) work with tasks, reasoning, that model show results, when their "think step for step". This to methods prompts, model to more solving tasks. However, how and other prompts, this approach has two key limitations: 1. He is applied to all tasks, although model can more high accuracy at different to tasks. 2. He not always with points view efficiency, since intermediate computation text. In given work we we offer approach, which allows model between various "thinking" in dependency from tasks. We that between: - answer - results ...substantially improves general performance, especially in tasks, requiring mathematical or complex logical conclusions. ## Methodology ### thinking We we determine and we compare main mode thinking, which can models: 1. **mode**: Model answers directly, without intermediate steps reasoning. 2. **mode**: Model complex task on sequence intermediate steps before provision final answer. 3. **Mode **: Model offers answer, then verifies its correctness, errors. 4. **mode**: Model first determines mode for given tasks, and then applies its • Self-reported
Reasoning
Logical reasoning and analysis
GPQA
accuracy • Self-reported
Multimodal
Working with images and visual data
MMMU
Accuracy CoT We analysis efficiency model at use method Chain-of-Thought (CoT, chain reasoning) for solutions tasks, requiring reasoning. CoT offers model not simply generate final answer, but perform step-by-step reasoning before answer. Evaluation following manner: 1. model questions, requiring reasoning, and we ask show intermediate steps 2. not only final answer, but and and correctness reasoning 3. accuracy answers, with using CoT, and answers, direct analysis is measurement: - model break down complex tasks on more simple steps - in reasoning model - improvements accuracy reasoning - errors, which can in process reasoning • Self-reported
Other Tests
Specialized benchmarks
MMLU-Pro
0-shot CoT Method thinking by chain without examples (0-shot Chain-of-Thought, 0-shot CoT) encourages model solve tasks step by step, without provision specific examples such process reasoning. This approach uses general prompts, such how "Let's let's solve this task step for step" or "Let's ", in order to stimulate model to process reasoning before final answer. Method 0-shot CoT especially useful, when no capabilities or provide samples reasoning for specific tasks. He usually outperforms simple prompts without reasoning, so how allows model break down complex task on more managed subtasks, that probability errors at solving multi-step tasks. not less, 0-shot CoT usually other CoT, in which are used examples, since model not receives specific about that, how manner structure its thinking for specific tasks. for examples quality reasoning depends from abilities model independently way analysis tasks • Self-reported
TriviaQA
5-shot • Self-reported
License & Metadata
License
apache_2_0
Announcement Date
March 17, 2025
Last Updated
July 19, 2025
Similar Models
All ModelsMistral Small 3.1 24B Instruct
Mistral AI
MM24.0B
Best score:0.9 (HumanEval)
Released:Mar 2025
Mistral Small 3.2 24B Instruct
Mistral AI
MM23.6B
Best score:0.9 (HumanEval)
Released:Jun 2025
Mistral Small 3 24B Base
Mistral AI
MM23.6B
Best score:0.9 (ARC)
Released:Jan 2025
Pixtral-12B
Mistral AI
MM12.4B
Best score:0.7 (HumanEval)
Released:Sep 2024
Price:$0.15/1M tokens
Magistral Medium
Mistral AI
MM24.0B
Best score:0.7 (GPQA)
Released:Jun 2025
Mistral Small 3 24B Instruct
Mistral AI
24.0B
Best score:0.8 (HumanEval)
Released:Jan 2025
Price:$0.10/1M tokens
Gemma 3 27B
MM27.0B
Best score:0.9 (HumanEval)
Released:Mar 2025
Price:$0.11/1M tokens
Gemma 3 12B
MM12.0B
Best score:0.9 (HumanEval)
Released:Mar 2025
Price:$0.05/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.