Microsoft logo

Phi 4 Mini

Microsoft

Phi 4 Mini Instruct is a lightweight open model with 3.8 billion parameters built on synthetic data and filtered web data, specializing in high-quality reasoning. It supports a 128K token context window and has been refined for instruction following and safety through supervised fine-tuning and direct preference optimization.

Key Specifications

Parameters
3.8B
Context
-
Release Date
February 1, 2025
Average Score
65.4%

Timeline

Key dates in the model's history
Announcement
February 1, 2025
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
3.8B
Training Tokens
5.0T tokens
Knowledge Cutoff
June 1, 2024
Family
-
Capabilities
MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
HellaSwag
5-shotSelf-reported
69.1%
MMLU
5-shotSelf-reported
67.3%
TruthfulQA
MC2, 10-shot In this method we we use MC2 () for improvement performance base model on complex tasks in training with several examples (k-shot). In we where base model access to k examples, for which should new query. We we compare when model directly answers on new query, with approach MC2, where model uses data examples for formation new "" (model), which then is applied for answer on new query. In 10-shot we and examples, which demonstrate more high performance or specific abilities, than directly model, for tasks solutions mathematical tasks. Then we we use these examples for which will to query. This approach especially useful for evaluation MC2 on tasks, requiring complex mathematical reasoningSelf-reported
66.4%
Winogrande
5-shot Task uses format few-shot for demonstration models samples answers. 5-shot means, that model 5 examples for understanding format output before new query. This approach helps improve quality answers, model context and template for formation correct answers. In context machine training "shot" relates to to number examples, model for training or settings. 5-shot means, that for tasks is provided 5 examples. Method especially efficient at work with models, since allows "" their behavior through examples instead instructions, data and corresponding resultsSelf-reported
67.0%

Mathematics

Mathematical problems and computations
GSM8k
8-shot, CoT Method 8-shot, CoT (chain thinking) includes provision model examples execution tasks with solutions, and then query on solution new tasks. Examples demonstrate process reasoning, usually with phrases "let's let's think step for step", which helps model structure its answer. 8-shot means, that model is provided examples questions with detailed answers, how correctly apply chain thinking for solutions tasks specific type. When model then receives new task, she/it should step-by-step reasoning for formation its answerSelf-reported
88.6%
MATH
0-shot, CoT Computation without with reasoning AI: 0-shot, CoT Computation without with reasoningSelf-reported
64.0%
MGSM
5-shotSelf-reported
63.9%

Reasoning

Logical reasoning and analysis
BIG-Bench Hard
0-shot, CoT System offers model solve task without any-or additional examples, but encourages her/its its reasoning step for step, before than give final answer. This allows intermediate steps thinking modelSelf-reported
70.4%
GPQA
0-shot, CoT In this method we we provide task model without any-or examples execution (0-shot) and we ask her/its think reasoning (Chain of Thought, CoT). Method chains reasoning that model step by step shows course its thoughts at solving tasks, that helps it to more answer, especially in complex cases. Instead that, in order to immediately answer, model process its reasoning, to solving. When 0-shot approach we not model no/none preliminary examples that, how solve tasks, that she/it using only its preliminarily obtained knowledgeSelf-reported
25.2%

Other Tests

Specialized benchmarks
ARC-C
10-shot AI: 10-shot solutions model, in order to format and answers. can be or that indeed model in or from (for example, from people). Key : - samples solutions for 10 various tasks (in some can be 5-shot or other number examples) - Can include examples reasoning or only answers - model information about format and approach to solving tasks - Can reasoning or solutions : - ability model use format - Allows model understand level reasoning - methods solutions Usually is applied in tasks, requiring format or use method reasoningSelf-reported
83.7%
Arena Hard
Standard evaluation AI: answer model on query with This approach for measurement base abilities model answer on query. We simply query model and we receive answer. When users query, we sometimes we can apply specific settings, such how and however for given evaluation we on 0, in order to obtain most answer model. : evaluation, practically to all We we use evaluation for measurement base abilities model answer on query. Advantages: measurement abilities, Disadvantages: to model can give or answers without additional instructions or limitationsSelf-reported
32.8%
BoolQ
In this approach we first we use two example, in order to model, that from her This technique helps model understand structure output and reasoning, which we without necessity explicitly each 2-shot approach in that, that examples can quickly behavior, that especially useful, when we in order to model format or reasoning. Demonstration several examples also can help since model several tasks. However is then, that for some complex tasks two examples can be for demonstration all behavior. that, if examples or not possible data, model can representation about taskSelf-reported
81.2%
MMLU-Pro
0-shot, CoT Method 0-shot, CoT (chain thinking without examples) consists in that, that model receives task without any-or examples solutions and instruction think step for step, before than give final answer. This approach encourages model course its reasoning, complex task on more simple components. In this method explicitly model reason, using prompts type "Let's let's think step for step" or "step by step". instructions model generate intermediate steps reasoning before final answer. Research showed, that even without examples solutions tasks ("0-shot") simple model think sequentially can significantly improve her/its performance on complex tasks, requiring multi-step reasoningSelf-reported
52.8%
Multilingual MMLU
5-shotSelf-reported
49.3%
OpenBookQA
10-shot AI: solutions model and several examples, where model answers. with demonstration tasks and that, how correct solutions. Then 10 different approaches, which model can use for obtaining correct answers. Usually is used for: - solutions tasks - reasoning - systems models complex reasoning - solutions - reasoning - and concepts How this do: 1. task or tasks for testing. 2. 10+ examples solutions from model. 3. solutions and various approaches and strategies. 4. approaches from most to less 5. these examples, in order to show model correct approaches. 6. When with model examples correct approaches and her/its solve new problem, using Example prompts: "I 10 various approaches to solving tasks on on different methods and strategies." : - This method can use for improvements performance model by means of demonstration diverse approaches to solving. - examples, that better model approaches. - at testing, can whether model or known strategies. - Especially for tasks, requiring various for obtaining correct answerSelf-reported
79.2%
PIQA
5-shot Method several examples (few-shot) — this approach to training and language models, at which model is provided several examples execution tasks before that, how she/it should execute new task. In context 5-shot model receive five examples (samples) correct execution tasks. These examples demonstrate format and answer, model understand and context. This method especially useful, when: - Model should adapt to task without additional training - format output - reasoning or In difference from zero-shot approach (without examples) or one-shot (with one example), 5-shot ensures more results for number samples, from which model can Research show, that performance models often with number examples to specific after which or even efficiency from-for contextSelf-reported
77.6%
Social IQa
5-shotSelf-reported
72.5%

License & Metadata

License
mit
Announcement Date
February 1, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.