o3

Name: o3
Author: OpenAI

Multimodal

OpenAI

OpenAI's most powerful reasoning model. o3 is a versatile and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction following. Use it for solving multi-step problems that involve analyzing text, code, and images.

Key Specifications

Parameters

Context

200.0K

Release Date

April 16, 2025

Average Score

63.4%

API Documentation Research Paper

Timeline

Key dates in the model's history

Announcement

April 16, 2025

Last Update

July 19, 2025

Today

May 10, 2026

Technical Specifications

Parameters

Training Tokens

Knowledge Cutoff

May 31, 2024

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$2.00

Output (per 1M tokens)

$8.00

Max Input Tokens

200.0K

Max Output Tokens

100.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests

SWE-Bench Verified

accuracy • Self-reported

69.1%

Reasoning

Logical reasoning and analysis

GPQA

OpenAI o3 - Diamond thinking without tools AI: OpenAI o3 • Self-reported

83.3%

Multimodal

Working with images and visual data

MathVista

accuracy • Self-reported

86.8%

MMMU

OpenAI o3 with mode thinking - Solution visual tasks level with using multimodal reasoning • Self-reported

82.9%

Other Tests

Specialized benchmarks

Aider-Polyglot

accuracy (full) • Self-reported

81.3%

AIME 2024

accuracy (without tools) • Self-reported

91.6%

AIME 2025

pass@1 (without tools) • Self-reported

86.4%

ARC-AGI

evaluation on test set • Self-reported

88.0%

ARC-AGI v2

accuracy • Verified

6.5%

BrowseComp

accuracy (with Python + ) • Self-reported

49.7%

CharXiv-R

OpenAI o3 with mode thinking - and analysis • Self-reported

78.6%

FrontierMath

accuracy • Self-reported

15.8%

Humanity's Last Exam

accuracy (without tools) • Self-reported

20.2%

Humanity's Last Exam

OpenAI o3 with included mode thinking (Python + tools ) - set questions expert level by various subjects • Self-reported

24.3%

Humanity's Last Exam

OpenAI o3 with included mode thinking (without tools) - set questions expert level by various subjects • Self-reported

14.7%

Scale MultiChallenge

accuracy • Self-reported

56.5%

Scale MultiChallenge

OpenAI o3 with included mode thinking - Benchmark execution instructions • Self-reported

60.4%

COLLIE

OpenAI o3 with mode thinking - instructions at text • Self-reported

98.4%

Tau2 airline

OpenAI o3 with mode thinking - Benchmark functions () • Self-reported

64.8%

Tau2 retail

OpenAI o3 with mode thinking - Benchmark functions () • Self-reported

80.2%

Tau2 telecom

OpenAI o3 with mode thinking - Benchmark functions () • Self-reported

58.2%

MMMU-Pro

OpenAI o3 with mode thinking - solution visual tasks level with using reasoning • Self-reported

76.4%

VideoMMMU

OpenAI o3 with mode thinking - reasoning on basis (256 ) • Self-reported

83.3%

ERQA

OpenAI o3 with mode thinking - reasoning • Self-reported

64.0%

Tau-bench

accuracy (average for Airline/Retail) • Self-reported

63.0%

License & Metadata

License

proprietary

Announcement Date

April 16, 2025

Last Updated

July 19, 2025

Similar Models

All Models

GPT-4o

OpenAI

Best score:0.9 (MMLU)

Released:Aug 2024

Price:$2.50/1M tokens

GPT-4o mini

OpenAI

Best score:0.9 (HumanEval)

Released:Jul 2024

Price:$0.15/1M tokens

GPT-4.1

OpenAI

Best score:0.9 (MMLU)

Released:Apr 2025

Price:$2.00/1M tokens

GPT-4.5

OpenAI

Best score:0.9 (MMLU)

Released:Feb 2025

Price:$75.00/1M tokens

GPT-5 nano

OpenAI

Best score:0.7 (GPQA)

Released:Aug 2025

Price:$0.05/1M tokens

o1-pro

OpenAI

Best score:0.8 (GPQA)

Released:Dec 2024

GPT-4

OpenAI

Best score:1.0 (ARC)

Released:Jun 2023

Price:$30.00/1M tokens

GPT-4o

OpenAI

Best score:0.9 (HumanEval)

Released:May 2024

Price:$2.50/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.