OpenAI logo

o3

Multimodal
OpenAI

OpenAI's most powerful reasoning model. o3 is a versatile and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction following. Use it for solving multi-step problems that involve analyzing text, code, and images.

Key Specifications

Parameters
-
Context
200.0K
Release Date
April 16, 2025
Average Score
63.4%

Timeline

Key dates in the model's history
Announcement
April 16, 2025
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
-
Training Tokens
-
Knowledge Cutoff
May 31, 2024
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$2.00
Output (per 1M tokens)
$8.00
Max Input Tokens
200.0K
Max Output Tokens
100.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests
SWE-Bench Verified
accuracySelf-reported
69.1%

Reasoning

Logical reasoning and analysis
GPQA
OpenAI o3 - Diamond thinking without tools AI: OpenAI o3Self-reported
83.3%

Multimodal

Working with images and visual data
MathVista
accuracySelf-reported
86.8%
MMMU
OpenAI o3 with mode thinking - Solution visual tasks level with using multimodal reasoningSelf-reported
82.9%

Other Tests

Specialized benchmarks
Aider-Polyglot
accuracy (full)Self-reported
81.3%
AIME 2024
accuracy (without tools)Self-reported
91.6%
AIME 2025
pass@1 (without tools)Self-reported
86.4%
ARC-AGI
evaluation on test setSelf-reported
88.0%
ARC-AGI v2
accuracyVerified
6.5%
BrowseComp
accuracy (with Python + )Self-reported
49.7%
CharXiv-R
OpenAI o3 with mode thinking - and analysisSelf-reported
78.6%
FrontierMath
accuracySelf-reported
15.8%
Humanity's Last Exam
accuracy (without tools)Self-reported
20.2%
Humanity's Last Exam
OpenAI o3 with included mode thinking (Python + tools ) - set questions expert level by various subjectsSelf-reported
24.3%
Humanity's Last Exam
OpenAI o3 with included mode thinking (without tools) - set questions expert level by various subjectsSelf-reported
14.7%
Scale MultiChallenge
accuracySelf-reported
56.5%
Scale MultiChallenge
OpenAI o3 with included mode thinking - Benchmark execution instructionsSelf-reported
60.4%
COLLIE
OpenAI o3 with mode thinking - instructions at textSelf-reported
98.4%
Tau2 airline
OpenAI o3 with mode thinking - Benchmark functions ()Self-reported
64.8%
Tau2 retail
OpenAI o3 with mode thinking - Benchmark functions ()Self-reported
80.2%
Tau2 telecom
OpenAI o3 with mode thinking - Benchmark functions ()Self-reported
58.2%
MMMU-Pro
OpenAI o3 with mode thinking - solution visual tasks level with using reasoningSelf-reported
76.4%
VideoMMMU
OpenAI o3 with mode thinking - reasoning on basis (256 )Self-reported
83.3%
ERQA
OpenAI o3 with mode thinking - reasoningSelf-reported
64.0%
Tau-bench
accuracy (average for Airline/Retail)Self-reported
63.0%

License & Metadata

License
proprietary
Announcement Date
April 16, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.