o3
MultimodalOpenAI's most powerful reasoning model. o3 is a versatile and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction following. Use it for solving multi-step problems that involve analyzing text, code, and images.
Key Specifications
Parameters
-
Context
200.0K
Release Date
April 16, 2025
Average Score
63.4%
Timeline
Key dates in the model's history
Announcement
April 16, 2025
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
-
Training Tokens
-
Knowledge Cutoff
May 31, 2024
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$2.00
Output (per 1M tokens)
$8.00
Max Input Tokens
200.0K
Max Output Tokens
100.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
Programming
Programming skills tests
SWE-Bench Verified
accuracy • Self-reported
Reasoning
Logical reasoning and analysis
GPQA
OpenAI o3 - Diamond thinking without tools
AI: OpenAI o3 • Self-reported
Multimodal
Working with images and visual data
MathVista
accuracy • Self-reported
MMMU
OpenAI o3 with mode thinking - Solution visual tasks level with using multimodal reasoning • Self-reported
Other Tests
Specialized benchmarks
Aider-Polyglot
accuracy (full) • Self-reported
AIME 2024
accuracy (without tools) • Self-reported
AIME 2025
pass@1 (without tools) • Self-reported
ARC-AGI
evaluation on test set • Self-reported
ARC-AGI v2
accuracy • Verified
BrowseComp
accuracy (with Python + ) • Self-reported
CharXiv-R
OpenAI o3 with mode thinking - and analysis • Self-reported
FrontierMath
accuracy • Self-reported
Humanity's Last Exam
accuracy (without tools) • Self-reported
Humanity's Last Exam
OpenAI o3 with included mode thinking (Python + tools ) - set questions expert level by various subjects • Self-reported
Humanity's Last Exam
OpenAI o3 with included mode thinking (without tools) - set questions expert level by various subjects • Self-reported
Scale MultiChallenge
accuracy • Self-reported
Scale MultiChallenge
OpenAI o3 with included mode thinking - Benchmark execution instructions • Self-reported
COLLIE
OpenAI o3 with mode thinking - instructions at text • Self-reported
Tau2 airline
OpenAI o3 with mode thinking - Benchmark functions () • Self-reported
Tau2 retail
OpenAI o3 with mode thinking - Benchmark functions () • Self-reported
Tau2 telecom
OpenAI o3 with mode thinking - Benchmark functions () • Self-reported
MMMU-Pro
OpenAI o3 with mode thinking - solution visual tasks level with using reasoning • Self-reported
VideoMMMU
OpenAI o3 with mode thinking - reasoning on basis (256 ) • Self-reported
ERQA
OpenAI o3 with mode thinking - reasoning • Self-reported
Tau-bench
accuracy (average for Airline/Retail) • Self-reported
License & Metadata
License
proprietary
Announcement Date
April 16, 2025
Last Updated
July 19, 2025
Similar Models
All ModelsGPT-4o
OpenAI
MM
Best score:0.9 (MMLU)
Released:Aug 2024
Price:$2.50/1M tokens
GPT-4o mini
OpenAI
MM
Best score:0.9 (HumanEval)
Released:Jul 2024
Price:$0.15/1M tokens
GPT-4.1
OpenAI
MM
Best score:0.9 (MMLU)
Released:Apr 2025
Price:$2.00/1M tokens
GPT-4.5
OpenAI
MM
Best score:0.9 (MMLU)
Released:Feb 2025
Price:$75.00/1M tokens
GPT-5 nano
OpenAI
MM
Best score:0.7 (GPQA)
Released:Aug 2025
Price:$0.05/1M tokens
o1-pro
OpenAI
MM
Best score:0.8 (GPQA)
Released:Dec 2024
GPT-4
OpenAI
MM
Best score:1.0 (ARC)
Released:Jun 2023
Price:$30.00/1M tokens
GPT-4o
OpenAI
MM
Best score:0.9 (HumanEval)
Released:May 2024
Price:$2.50/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.