o1-mini

Name: o1-mini
Author: OpenAI

OpenAI

o1-mini is a cost-effective language model developed by OpenAI, designed for complex reasoning tasks while minimizing computational resources.

Key Specifications

Parameters

Context

128.0K

Release Date

September 12, 2024

Average Score

71.9%

Research Paper Results Blog

Timeline

Key dates in the model's history

Announcement

September 12, 2024

Last Update

July 19, 2025

Today

May 10, 2026

Technical Specifications

Parameters

Training Tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$3.00

Output (per 1M tokens)

$12.00

Max Input Tokens

128.0K

Max Output Tokens

65.5K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

MMLU

Reasoning by chain with example AI: I'll analyze this problem step by step. For step n, f(n) represents the total number of jumps to reach n. Base cases: f(0) = 0 (already at the start) f(1) = 1 (only one way to reach 1, jumping from 0) For n ≥ 2, we can reach n by jumping from n-1 or n-2. So f(n) = f(n-1) + f(n-2) This gives us the Fibonacci sequence. f(2) = f(1) + f(0) = 1 + 0 = 1 f(3) = f(2) + f(1) = 1 + 1 = 2 f(4) = f(3) + f(2) = 2 + 1 = 3 f(5) = f(4) + f(3) = 3 + 2 = 5 f(6) = f(5) + f(4) = 5 + 3 = 8 f(7) = f(6) + f(5) = 8 + 5 = 13 f(8) = f(7) + f(6) = 13 + 8 = 21 f(9) = f(8) + f(7) = 21 + 13 = 34 f(10) = f(9) + f(8) = 34 + 21 = 55 The answer is 55 • Self-reported

85.2%

Programming

Programming skills tests

HumanEval

Accuracy Pass@1 Accuracy Pass@1 — this percentage solved tasks at first attempt. We one solution for each tasks and we verify its. If this solution correct, we we consider task This score especially useful for scenarios, where model should give correct answer with first attempts. However he not accounts for ability model correct errors through several attempts, that can in with model in capacity by programming • Self-reported

92.4%

Reasoning

Logical reasoning and analysis

GPQA

0-shot Chain of Thought Method (Diamond) offers improvement for process thinking models. consists in that, in order to structure thinking model in form "", with on task, then field thinking and various approaches, and finally in order to on answer. Approach uses example (0-shot) method chains reasoning (Chain of Thought), in order to model reasoning without demonstration examples. This method especially useful for complex tasks, requiring analysis and solutions • Self-reported

60.0%

Other Tests

Specialized benchmarks

Cybersecurity CTFs

Pass@12 accuracy This metric measures efficiency solutions tasks coding, evaluating, can whether model correctly solve problem although would one times for 12 attempts (or number attempts). This more way evaluation, than measurement accuracy with first attempts, and he better reflects use, when users can several solutions and When Pass@k: - Model generates n solutions for tasks - them manner k solutions - is considered if although would one from k solutions works correctly Usually are used such how Pass@1, Pass@10 or Pass@100. if at model is probability p solve task for one attempt, then probability solve her/its although would one times for k attempts 1-(1-p)^k • Self-reported

28.7%

MATH-500

0-shot Chain of Thought AI: 0-shot chain thinking • Self-reported

90.0%

SuperGLUE

Evaluation on set AI: On validation set of ~400 problems, my model gets ~78% of the problems correct. This is a substantial increase over model baselines I am comparing against, which get ~55% to ~70% of problems correct. It's important to evaluate carefully. I follow 3 rules in my evaluation: 1. The solution must be correct. For problems with numerical or simple symbolic answers (e.g. "x = 5" or "72 degrees"), I check if the answer is present at the end of the model's solution. For problems with more complex symbolic answers, I manually check if the solution is correct. 2. The solution must have no hallucinations or made-up facts. I manually review all examples in my validation set to ensure the chain-of-thought is correct. 3. I avoid problems that might have appeared in my model's training data. I source most of my problems from recent competitions, or create them myself. This ensures my model is not simply memorizing answers. Humans have always been the gold standard. I show that with the right methods, AI can demonstrate similar abilities and clear reasoning. I include a discussion of my model's failure cases to highlight where it still falls short • Self-reported

75.0%

License & Metadata

License

proprietary

Announcement Date

September 12, 2024

Last Updated

July 19, 2025

Articles about o1-mini

What Happens When AI Gets Infinite Compute? OpenAI's Disturbing Discovery

OpenAI's research team reveals that models go 'insane' when given unlimited reasoning tokens — raising new questions about AI scaling and safety.

March 22, 2026

7 min

Similar Models

All Models

GPT-4 Turbo

OpenAI

Best score:0.9 (HumanEval)

Released:Apr 2024

Price:$10.00/1M tokens

o1

OpenAI

Best score:0.9 (MMLU)

Released:Dec 2024

Price:$15.00/1M tokens

o1-preview

OpenAI

Best score:0.9 (MMLU)

Released:Sep 2024

Price:$15.00/1M tokens

GPT-5 Codex

OpenAI

Released:Sep 2025

Price:$2.00/1M tokens

o3-mini

OpenAI

Best score:0.9 (MMLU)

Released:Jan 2025

Price:$1.10/1M tokens

GPT-3.5 Turbo

OpenAI

Best score:0.7 (MMLU)

Released:Mar 2023

Price:$0.50/1M tokens

o3

OpenAI

Best score:0.8 (GPQA)

Released:Apr 2025

Price:$2.00/1M tokens

GPT-4.5

OpenAI

Best score:0.9 (MMLU)

Released:Feb 2025

Price:$75.00/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.