OpenAI logo

o1-mini

OpenAI

o1-mini is a cost-effective language model developed by OpenAI, designed for complex reasoning tasks while minimizing computational resources.

Key Specifications

Parameters
-
Context
128.0K
Release Date
September 12, 2024
Average Score
71.9%

Timeline

Key dates in the model's history
Announcement
September 12, 2024
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$3.00
Output (per 1M tokens)
$12.00
Max Input Tokens
128.0K
Max Output Tokens
65.5K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
MMLU
Reasoning by chain with example AI: I'll analyze this problem step by step. For step n, f(n) represents the total number of jumps to reach n. Base cases: f(0) = 0 (already at the start) f(1) = 1 (only one way to reach 1, jumping from 0) For n ≥ 2, we can reach n by jumping from n-1 or n-2. So f(n) = f(n-1) + f(n-2) This gives us the Fibonacci sequence. f(2) = f(1) + f(0) = 1 + 0 = 1 f(3) = f(2) + f(1) = 1 + 1 = 2 f(4) = f(3) + f(2) = 2 + 1 = 3 f(5) = f(4) + f(3) = 3 + 2 = 5 f(6) = f(5) + f(4) = 5 + 3 = 8 f(7) = f(6) + f(5) = 8 + 5 = 13 f(8) = f(7) + f(6) = 13 + 8 = 21 f(9) = f(8) + f(7) = 21 + 13 = 34 f(10) = f(9) + f(8) = 34 + 21 = 55 The answer is 55Self-reported
85.2%

Programming

Programming skills tests
HumanEval
Accuracy Pass@1 Accuracy Pass@1 — this percentage solved tasks at first attempt. We one solution for each tasks and we verify its. If this solution correct, we we consider task This score especially useful for scenarios, where model should give correct answer with first attempts. However he not accounts for ability model correct errors through several attempts, that can in with model in capacity by programmingSelf-reported
92.4%

Reasoning

Logical reasoning and analysis
GPQA
0-shot Chain of Thought Method (Diamond) offers improvement for process thinking models. consists in that, in order to structure thinking model in form "", with on task, then field thinking and various approaches, and finally in order to on answer. Approach uses example (0-shot) method chains reasoning (Chain of Thought), in order to model reasoning without demonstration examples. This method especially useful for complex tasks, requiring analysis and solutionsSelf-reported
60.0%

Other Tests

Specialized benchmarks
Cybersecurity CTFs
Pass@12 accuracy This metric measures efficiency solutions tasks coding, evaluating, can whether model correctly solve problem although would one times for 12 attempts (or number attempts). This more way evaluation, than measurement accuracy with first attempts, and he better reflects use, when users can several solutions and When Pass@k: - Model generates n solutions for tasks - them manner k solutions - is considered if although would one from k solutions works correctly Usually are used such how Pass@1, Pass@10 or Pass@100. if at model is probability p solve task for one attempt, then probability solve her/its although would one times for k attempts 1-(1-p)^kSelf-reported
28.7%
MATH-500
0-shot Chain of Thought AI: 0-shot chain thinkingSelf-reported
90.0%
SuperGLUE
Evaluation on set AI: On validation set of ~400 problems, my model gets ~78% of the problems correct. This is a substantial increase over model baselines I am comparing against, which get ~55% to ~70% of problems correct. It's important to evaluate carefully. I follow 3 rules in my evaluation: 1. The solution must be correct. For problems with numerical or simple symbolic answers (e.g. "x = 5" or "72 degrees"), I check if the answer is present at the end of the model's solution. For problems with more complex symbolic answers, I manually check if the solution is correct. 2. The solution must have no hallucinations or made-up facts. I manually review all examples in my validation set to ensure the chain-of-thought is correct. 3. I avoid problems that might have appeared in my model's training data. I source most of my problems from recent competitions, or create them myself. This ensures my model is not simply memorizing answers. Humans have always been the gold standard. I show that with the right methods, AI can demonstrate similar abilities and clear reasoning. I include a discussion of my model's failure cases to highlight where it still falls shortSelf-reported
75.0%

License & Metadata

License
proprietary
Announcement Date
September 12, 2024
Last Updated
July 19, 2025

Articles about o1-mini

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.