OpenAI logo

o3-mini

OpenAI

A smaller version of O3 that is expected to offer improved multimodal capabilities, more advanced logical reasoning, and more efficient resource usage compared to previous models, while maintaining high performance on core tasks.

Key Specifications

Parameters
-
Context
200.0K
Release Date
January 30, 2025
Average Score
56.9%

Timeline

Key dates in the model's history
Announcement
January 30, 2025
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
-
Training Tokens
-
Knowledge Cutoff
September 30, 2023
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$1.10
Output (per 1M tokens)
$4.40
Max Input Tokens
200.0K
Max Output Tokens
100.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
MMLU
o3-mini high AI: I I will solve tasks from by mathematics AIME. I first thoroughly task, her/its on subtasks and I will solve each step for step. I I will use all necessary mathematical tools, including numbers and etc.etc. goal — solve task correctly and obtain correct answer. task I I will solve following manner: 1. task, all important details and that is required find. 2. general strategy solutions, key concepts and which can be 3. solution, its on steps and full justification each step. 4. its solution, that it all tasks. 5. final answer in format (usually number from 0 to 999). I I will for computational errors and its work. I also I will consider approaches, if approach complex orSelf-reported
86.9%

Programming

Programming skills tests
SWE-Bench Verified
Method (Verified Predictions), in evaluation, on for determination model. This provision model question with context and comparison her/its answer with in advance reference answer. If answer model can on basis her/its answer other for example, at more model or human-Verification useful for evaluation actual accuracy model, especially in tasks with how in case "Frontier AGI" models. These systems can do about which even For example, some LLM can about mathematical which are complex, that even difficult their verify. Task verification still when model new scientific or which verify. In such cases important rely on methods evaluation, which can and their match knowledge, even if whileSelf-reported
49.3%

Mathematics

Mathematical problems and computations
MATH
o3-mini high AI: 1/10/24 several mathematical tasks with school to first This well in model: she/it question with points view, solutions and should them. Not handles with some more complex tasks, deep understanding. Strong side: - methods - Good tasks in equations - perform probability and Limitations: - errors in complex especially in Can computational errors - Not understanding tries use for solutions tasks Model solves mathematical tasks HS/early-on level but not She/It well handles with tasks, but with that require more deep understanding or thinkingSelf-reported
97.9%
MGSM
model: o3-mini : (0,7) Description: o3-mini with high (0,7) — this abilities and data model o3-mini. high temperature allows model more possible answers, that can be useful for tasks or generation diverse However this can lead to to and accuracy answers by comparison with more temperatureSelf-reported
92.0%

Reasoning

Logical reasoning and analysis
GPQA
DIAMOND (DIsentangled AMortized ONline Detective) - this for and at work with computations. In difference from many modern approaches, DIAMOND especially efficient in conditions and can process very large computations without performance. Key : 1. training: DIAMOND uses for that allows it quickly problems in 2. analysis: process data and adapt to in time. 3. : DIAMOND and allowing exactly determine problems. 4. : method successfully works with and maintaining at this high accuracy. show, that DIAMOND outperforms existing methods on 17-23% by F1 and works in 30-100 times at analysis systems. was successfully on various machine training and high efficiency in real scenarios useSelf-reported
77.2%

Other Tests

Specialized benchmarks
Aider-Polyglot
evaluation on benchmarkSelf-reported
66.7%
Aider-Polyglot Edit
evaluation by benchmarkSelf-reported
60.4%
AIME 2024
evaluation on test setSelf-reported
87.3%
COLLIE
evaluation by benchmarkSelf-reported
98.7%
ComplexFuncBench
evaluation on benchmarkSelf-reported
17.6%
FrontierMath
pass @ 1Self-reported
9.2%
Graphwalks BFS <128k
result benchmarkSelf-reported
51.0%
Graphwalks parents <128k
evaluation benchmarkSelf-reported
58.3%
IFEval
evaluation on benchmarkSelf-reported
93.9%
Internal API instruction following (hard)
Evaluation efficiencySelf-reported
50.0%
LiveBench
o3-mini high model type GPT, answer on questions about world. Good works with information without on tools. Performance Advantages: and answers, for queries. system. Limitations: tools and capabilities for solutions complex tasks, where computation. answers on questions about world, and Example query: "in ?" for • obtaining facts and data • knowledge and queries •Self-reported
84.6%
MultiChallenge
indicator efficiencySelf-reported
39.9%
MultiChallenge (o3-mini grader)
indicator efficiency in testsSelf-reported
50.2%
Multi-IF
evaluation by benchmarkSelf-reported
79.5%
Multilingual MMLU
evaluation benchmarkSelf-reported
80.7%
OpenAI-MRCR: 2 needle 128k
evaluation in benchmarkSelf-reported
18.7%
SimpleQA
accuracySelf-reported
15.0%
SWE-Lancer
percentage scoreSelf-reported
18.0%
SWE-Lancer (IC-Diamond subset)
percentage scoreSelf-reported
7.4%
TAU-bench Airline
evaluation on benchmarkSelf-reported
32.4%
TAU-bench Retail
evaluation on benchmarkSelf-reported
57.6%

License & Metadata

License
proprietary
Announcement Date
January 30, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.