Phi 4 Mini
Phi 4 Mini Instruct is a lightweight open model with 3.8 billion parameters built on synthetic data and filtered web data, specializing in high-quality reasoning. It supports a 128K token context window and has been refined for instruction following and safety through supervised fine-tuning and direct preference optimization.
Key Specifications
Parameters
3.8B
Context
-
Release Date
February 1, 2025
Average Score
65.4%
Timeline
Key dates in the model's history
Announcement
February 1, 2025
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
3.8B
Training Tokens
5.0T tokens
Knowledge Cutoff
June 1, 2024
Family
-
Capabilities
MultimodalZeroEval
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
HellaSwag
5-shot • Self-reported
MMLU
5-shot • Self-reported
TruthfulQA
MC2, 10-shot In this method we we use MC2 () for improvement performance base model on complex tasks in training with several examples (k-shot). In we where base model access to k examples, for which should new query. We we compare when model directly answers on new query, with approach MC2, where model uses data examples for formation new "" (model), which then is applied for answer on new query. In 10-shot we and examples, which demonstrate more high performance or specific abilities, than directly model, for tasks solutions mathematical tasks. Then we we use these examples for which will to query. This approach especially useful for evaluation MC2 on tasks, requiring complex mathematical reasoning • Self-reported
Winogrande
5-shot Task uses format few-shot for demonstration models samples answers. 5-shot means, that model 5 examples for understanding format output before new query. This approach helps improve quality answers, model context and template for formation correct answers. In context machine training "shot" relates to to number examples, model for training or settings. 5-shot means, that for tasks is provided 5 examples. Method especially efficient at work with models, since allows "" their behavior through examples instead instructions, data and corresponding results • Self-reported
Mathematics
Mathematical problems and computations
GSM8k
8-shot, CoT Method 8-shot, CoT (chain thinking) includes provision model examples execution tasks with solutions, and then query on solution new tasks. Examples demonstrate process reasoning, usually with phrases "let's let's think step for step", which helps model structure its answer. 8-shot means, that model is provided examples questions with detailed answers, how correctly apply chain thinking for solutions tasks specific type. When model then receives new task, she/it should step-by-step reasoning for formation its answer • Self-reported
MATH
0-shot, CoT Computation without with reasoning AI: 0-shot, CoT Computation without with reasoning • Self-reported
MGSM
5-shot • Self-reported
Reasoning
Logical reasoning and analysis
BIG-Bench Hard
0-shot, CoT System offers model solve task without any-or additional examples, but encourages her/its its reasoning step for step, before than give final answer. This allows intermediate steps thinking model • Self-reported
GPQA
0-shot, CoT In this method we we provide task model without any-or examples execution (0-shot) and we ask her/its think reasoning (Chain of Thought, CoT). Method chains reasoning that model step by step shows course its thoughts at solving tasks, that helps it to more answer, especially in complex cases. Instead that, in order to immediately answer, model process its reasoning, to solving. When 0-shot approach we not model no/none preliminary examples that, how solve tasks, that she/it using only its preliminarily obtained knowledge • Self-reported
Other Tests
Specialized benchmarks
ARC-C
10-shot AI: 10-shot solutions model, in order to format and answers. can be or that indeed model in or from (for example, from people). Key : - samples solutions for 10 various tasks (in some can be 5-shot or other number examples) - Can include examples reasoning or only answers - model information about format and approach to solving tasks - Can reasoning or solutions : - ability model use format - Allows model understand level reasoning - methods solutions Usually is applied in tasks, requiring format or use method reasoning • Self-reported
Arena Hard
Standard evaluation AI: answer model on query with This approach for measurement base abilities model answer on query. We simply query model and we receive answer. When users query, we sometimes we can apply specific settings, such how and however for given evaluation we on 0, in order to obtain most answer model. : evaluation, practically to all We we use evaluation for measurement base abilities model answer on query. Advantages: measurement abilities, Disadvantages: to model can give or answers without additional instructions or limitations • Self-reported
BoolQ
In this approach we first we use two example, in order to model, that from her This technique helps model understand structure output and reasoning, which we without necessity explicitly each 2-shot approach in that, that examples can quickly behavior, that especially useful, when we in order to model format or reasoning. Demonstration several examples also can help since model several tasks. However is then, that for some complex tasks two examples can be for demonstration all behavior. that, if examples or not possible data, model can representation about task • Self-reported
MMLU-Pro
0-shot, CoT Method 0-shot, CoT (chain thinking without examples) consists in that, that model receives task without any-or examples solutions and instruction think step for step, before than give final answer. This approach encourages model course its reasoning, complex task on more simple components. In this method explicitly model reason, using prompts type "Let's let's think step for step" or "step by step". instructions model generate intermediate steps reasoning before final answer. Research showed, that even without examples solutions tasks ("0-shot") simple model think sequentially can significantly improve her/its performance on complex tasks, requiring multi-step reasoning • Self-reported
Multilingual MMLU
5-shot • Self-reported
OpenBookQA
10-shot AI: solutions model and several examples, where model answers. with demonstration tasks and that, how correct solutions. Then 10 different approaches, which model can use for obtaining correct answers. Usually is used for: - solutions tasks - reasoning - systems models complex reasoning - solutions - reasoning - and concepts How this do: 1. task or tasks for testing. 2. 10+ examples solutions from model. 3. solutions and various approaches and strategies. 4. approaches from most to less 5. these examples, in order to show model correct approaches. 6. When with model examples correct approaches and her/its solve new problem, using Example prompts: "I 10 various approaches to solving tasks on on different methods and strategies." : - This method can use for improvements performance model by means of demonstration diverse approaches to solving. - examples, that better model approaches. - at testing, can whether model or known strategies. - Especially for tasks, requiring various for obtaining correct answer • Self-reported
PIQA
5-shot Method several examples (few-shot) — this approach to training and language models, at which model is provided several examples execution tasks before that, how she/it should execute new task. In context 5-shot model receive five examples (samples) correct execution tasks. These examples demonstrate format and answer, model understand and context. This method especially useful, when: - Model should adapt to task without additional training - format output - reasoning or In difference from zero-shot approach (without examples) or one-shot (with one example), 5-shot ensures more results for number samples, from which model can Research show, that performance models often with number examples to specific after which or even efficiency from-for context • Self-reported
Social IQa
5-shot • Self-reported
License & Metadata
License
mit
Announcement Date
February 1, 2025
Last Updated
July 19, 2025
Similar Models
All ModelsPhi-3.5-mini-instruct
Microsoft
3.8B
Best score:0.8 (ARC)
Released:Aug 2024
Price:$0.10/1M tokens
Phi 4 Mini Reasoning
Microsoft
3.8B
Best score:0.5 (GPQA)
Released:Apr 2025
Llama 3.1 8B Instruct
Meta
8.0B
Best score:0.8 (ARC)
Released:Jul 2024
Price:$0.20/1M tokens
Llama 3.2 3B Instruct
Meta
3.2B
Best score:0.8 (ARC)
Released:Sep 2024
Price:$0.01/1M tokens
Llama 3.1 Nemotron Nano 8B V1
NVIDIA
8.0B
Best score:0.5 (GPQA)
Released:Mar 2025
Gemma 2 9B
9.2B
Best score:0.7 (MMLU)
Released:Jun 2024
Ministral 8B Instruct
Mistral AI
8.0B
Best score:0.7 (ARC)
Released:Oct 2024
Price:$0.10/1M tokens
Qwen2.5 7B Instruct
Alibaba
7.6B
Best score:0.8 (HumanEval)
Released:Sep 2024
Price:$0.30/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.