Microsoft logo

Phi 4 Mini Reasoning

Microsoft

Phi-4-mini-reasoning is designed for multi-step, logic-intensive mathematical problem-solving tasks in memory/compute-constrained environments and latency-sensitive scenarios. Some use cases include formal proof generation, symbolic computation, complex word problems, and a wide range of mathematical reasoning scenarios. These models excel at maintaining context across steps, applying structured logic, and providing accurate, reliable solutions in domains requiring deep analytical thinking.

Key Specifications

Parameters
3.8B
Context
-
Release Date
April 30, 2025
Average Score
68.0%

Timeline

Key dates in the model's history
Announcement
April 30, 2025
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
3.8B
Training Tokens
150.0B tokens
Knowledge Cutoff
February 1, 2025
Family
-
Capabilities
MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

Reasoning

Logical reasoning and analysis
GPQA
Diamond Diamond - this model for solutions tasks, how addition to base model. She/It offers structured for solutions long chains logical reasoning with help steps, explanations and Diamond Tree of Thoughts (ToT), but with for improvement to He applies for complex tasks, where each step includes approach, containing: 1. subtasks 2. reasoning for her/its solutions 3. Solution this 4. solutions with help approach Model Diamond for mathematical tasks, although possible application and in other fields. She/It works, using which base model through this structured process solutions problemsSelf-reported
52.0%

Other Tests

Specialized benchmarks
AIME
Standard evaluation AI: At is model, which can solve tasks. Method evaluation: I model set tasks and how many from them she/it will solve correctly. : • correct answers and determine accuracy • compare different model • Allows evaluate ability model correct answers Disadvantages: • Not gives representations about thoughts model • Not accounts for, how well model to solving at answer • Not accounts for, indeed whether model solved task or simply answer • understanding errors model Examples: • GSM8K: tasks by mathematical reasoning, where accuracy is determined how proportion correct answers • MMLU: set tests with multiple choice by 57 subjects, where only optionSelf-reported
57.5%
MATH-500
Standard evaluation AI: ChatGPT : "Analysis capabilities GPT-4 in solving complex mathematical tasks" Standard methodology evaluation: In our evaluation mathematical abilities GPT-4 we used set from 20 tasks complexity, from and AIME. Each task was model in format without additional prompts or instructions. We answers in three categories: 1. Fully correct solution: model answer with correct and steps. 2. Partially correct solution: model approach, but errors in computations or 3. solution: model incorrect approach or incorrect answer. For each tasks we three attempts and best result. evaluation through API with 0.2, in order to ensure but at thisSelf-reported
94.6%

License & Metadata

License
mit
Announcement Date
April 30, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.