OpenAI logo

GPT-5

Multimodal
OpenAI

GPT-5 is our flagship model for coding, reasoning, and agentic tasks across domains. The best model for coding and agentic tasks with enhanced reasoning capabilities and moderate speed.

Key Specifications

Parameters
-
Context
400.0K
Release Date
August 7, 2025
Average Score
70.1%

Timeline

Key dates in the model's history
Announcement
August 7, 2025
Today
March 25, 2026

Technical Specifications

Parameters
-
Training Tokens
-
Knowledge Cutoff
September 30, 2024
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$1.25
Output (per 1M tokens)
$10.00
Max Input Tokens
400.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
MMLU
Standard benchmark by various with evaluation knowledgeSelf-reported
92.5%

Programming

Programming skills tests
SWE-Bench Verified
Mode thinking (to 128K tokens) with capabilities reasoning and approach to solving problemsSelf-reported
74.9%
HumanEval
Benchmark generation code with by functions on PythonSelf-reported
93.4%

Mathematics

Mathematical problems and computations
MATH
mode thinking with solution and mathematical tasksSelf-reported
84.7%

Reasoning

Logical reasoning and analysis
GPQA
GPT-5 - Diamond-thinking without tools This methodology uses mode thinking, which differs from more standard "and " thinking. consists in that, that we we use so "diamond-thinking" (thinking) - method, where model: 1. with on problem, set various approaches and possible 2. to most 3. these on thoroughly evaluating their 4. in full, solution This approach especially efficient for mathematical and tasks, requiring thinking for search solutions. During many cases model achieves exact, correct solutions, which not was would at use more methods thinking. Since we not we tool use, results can directly compare with where tools, in order to better understand their on performance modelSelf-reported
85.7%

Multimodal

Working with images and visual data
MMMU
GPT-5 with mode thinking - Solution visual tasks level withSelf-reported
84.2%

Other Tests

Specialized benchmarks
Aider-Polyglot
mode thinking (to 128 thousand tokens) with reasoning and understanding code on different languages programmingSelf-reported
88.0%
SWE-Lancer (IC-Diamond subset)
GPT-5 - IC SWE Diamond Freelance Coding Tasks (evaluation on basis )Self-reported
100.0%
AIME 2025
GPT-5 standard with included mode thinking (without tools) - mathematicsSelf-reported
94.6%
HealthBench Hard
Mode thinking for in complex onSelf-reported
1.6%
FrontierMath
GPT-5 standard with included mode thinking (only with tool python) - mathematics expert level FrontierMath levels 1-3.Self-reported
26.3%
HMMT 2025
GPT-5 standard with included mode thinking (without tools) - Harvard-MIT Mathematics Tournament.Self-reported
93.3%
Humanity's Last Exam
GPT-5 standard with mode thinking (without tools) - set questions expert level by various subjectsSelf-reported
24.8%
Scale MultiChallenge
GPT-5 with included mode thinking - Benchmark execution multi-step instructions.Self-reported
69.6%
BrowseComp
GPT-5 with included mode thinking - Benchmark agentic search andSelf-reported
54.9%
COLLIE
GPT-5 with included mode thinking - instructions in formSelf-reported
99.0%
MultiChallenge (o3-mini grader)
GPT-5 with o3-mini - Benchmark execution instructions with accuracy evaluationSelf-reported
69.6%
Internal API instruction following (hard)
GPT-5 - Evaluation execution instructions through internal API (complexity)Self-reported
64.0%
Tau2 airline
GPT-5 - Benchmark functions ()Self-reported
62.6%
Tau2 retail
GPT-5 with mode thinking - Benchmark functions ()Self-reported
81.1%
Tau2 telecom
GPT-5 with mode thinking - Benchmark functions (field)Self-reported
96.7%
MMMU-Pro
GPT-5 with mode thinking - solution visual tasks level with reasoningSelf-reported
78.4%
VideoMMMU
GPT-5 with mode thinking - reasoning (256 )Self-reported
84.6%
CharXiv-R
GPT-5 with mode thinking - Reasoning and scientificSelf-reported
81.1%
ERQA
GPT-5 with mode thinking - thinkingSelf-reported
65.7%
OpenAI-MRCR: 2 needle 128k
OpenAI-MRCR - search at 128 tokensSelf-reported
95.2%
OpenAI-MRCR: 2 needle 256k
OpenAI-MRCR 2-needle retrieval at 256 tokensSelf-reported
86.8%
Graphwalks BFS <128k
in (Graphwalks BFS) (<128k) for reasoning with contextSelf-reported
78.3%
Graphwalks parents <128k
on (<128k), for reasoning with contextSelf-reported
73.3%
BrowseComp Long Context 128k
BrowseComp option with context 128kSelf-reported
90.0%
BrowseComp Long Context 256k
BrowseComp option with context 256kSelf-reported
88.8%
VideoMME w sub.
VideoMME (long) with —Self-reported
86.7%
LongFact-Concepts
mode thinking for on for queriesSelf-reported
0.7%
LongFact-Objects
Mode thinking for on for queries, onSelf-reported
0.8%
FactScore
Mode thinking for evaluation actual accuracy. level onSelf-reported
1.0%

License & Metadata

License
proprietary
Announcement Date
August 7, 2025
Last Updated
July 24, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.