GPT-5

Name: GPT-5
Author: OpenAI

Multimodal

OpenAI

GPT-5 is our flagship model for coding, reasoning, and agentic tasks across domains. The best model for coding and agentic tasks with enhanced reasoning capabilities and moderate speed.

Key Specifications

Parameters

Context

400.0K

Release Date

August 7, 2025

Average Score

70.1%

API Documentation

Timeline

Key dates in the model's history

Announcement

August 7, 2025

Today

May 10, 2026

Technical Specifications

Parameters

Training Tokens

Knowledge Cutoff

September 30, 2024

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$1.25

Output (per 1M tokens)

$10.00

Max Input Tokens

400.0K

Max Output Tokens

128.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

MMLU

Standard benchmark by various with evaluation knowledge • Self-reported

92.5%

Programming

Programming skills tests

SWE-Bench Verified

Mode thinking (to 128K tokens) with capabilities reasoning and approach to solving problems • Self-reported

74.9%

HumanEval

Benchmark generation code with by functions on Python • Self-reported

93.4%

Mathematics

Mathematical problems and computations

MATH

mode thinking with solution and mathematical tasks • Self-reported

84.7%

Reasoning

Logical reasoning and analysis

GPQA

GPT-5 - Diamond-thinking without tools This methodology uses mode thinking, which differs from more standard "and " thinking. consists in that, that we we use so "diamond-thinking" (thinking) - method, where model: 1. with on problem, set various approaches and possible 2. to most 3. these on thoroughly evaluating their 4. in full, solution This approach especially efficient for mathematical and tasks, requiring thinking for search solutions. During many cases model achieves exact, correct solutions, which not was would at use more methods thinking. Since we not we tool use, results can directly compare with where tools, in order to better understand their on performance model • Self-reported

85.7%

Multimodal

Working with images and visual data

MMMU

GPT-5 with mode thinking - Solution visual tasks level with • Self-reported

84.2%

Other Tests

Specialized benchmarks

Aider-Polyglot

mode thinking (to 128 thousand tokens) with reasoning and understanding code on different languages programming • Self-reported

88.0%

SWE-Lancer (IC-Diamond subset)

GPT-5 - IC SWE Diamond Freelance Coding Tasks (evaluation on basis ) • Self-reported

100.0%

AIME 2025

GPT-5 standard with included mode thinking (without tools) - mathematics • Self-reported

94.6%

HealthBench Hard

Mode thinking for in complex on • Self-reported

1.6%

FrontierMath

GPT-5 standard with included mode thinking (only with tool python) - mathematics expert level FrontierMath levels 1-3. • Self-reported

26.3%

HMMT 2025

GPT-5 standard with included mode thinking (without tools) - Harvard-MIT Mathematics Tournament. • Self-reported

93.3%

Humanity's Last Exam

GPT-5 standard with mode thinking (without tools) - set questions expert level by various subjects • Self-reported

24.8%

Scale MultiChallenge

GPT-5 with included mode thinking - Benchmark execution multi-step instructions. • Self-reported

69.6%

BrowseComp

GPT-5 with included mode thinking - Benchmark agentic search and • Self-reported

54.9%

COLLIE

GPT-5 with included mode thinking - instructions in form • Self-reported

99.0%

MultiChallenge (o3-mini grader)

GPT-5 with o3-mini - Benchmark execution instructions with accuracy evaluation • Self-reported

69.6%

Internal API instruction following (hard)

GPT-5 - Evaluation execution instructions through internal API (complexity) • Self-reported

64.0%

Tau2 airline

GPT-5 - Benchmark functions () • Self-reported

62.6%

Tau2 retail

GPT-5 with mode thinking - Benchmark functions () • Self-reported

81.1%

Tau2 telecom

GPT-5 with mode thinking - Benchmark functions (field) • Self-reported

96.7%

MMMU-Pro

GPT-5 with mode thinking - solution visual tasks level with reasoning • Self-reported

78.4%

VideoMMMU

GPT-5 with mode thinking - reasoning (256 ) • Self-reported

84.6%

CharXiv-R

GPT-5 with mode thinking - Reasoning and scientific • Self-reported

81.1%

ERQA

GPT-5 with mode thinking - thinking • Self-reported

65.7%

OpenAI-MRCR: 2 needle 128k

OpenAI-MRCR - search at 128 tokens • Self-reported

95.2%

OpenAI-MRCR: 2 needle 256k

OpenAI-MRCR 2-needle retrieval at 256 tokens • Self-reported

86.8%

Graphwalks BFS <128k

in (Graphwalks BFS) (<128k) for reasoning with context • Self-reported

78.3%

Graphwalks parents <128k

on (<128k), for reasoning with context • Self-reported

73.3%

BrowseComp Long Context 128k

BrowseComp option with context 128k • Self-reported

90.0%

BrowseComp Long Context 256k

BrowseComp option with context 256k • Self-reported

88.8%

VideoMME w sub.

VideoMME (long) with — • Self-reported

86.7%

LongFact-Concepts

mode thinking for on for queries • Self-reported

0.7%

LongFact-Objects

Mode thinking for on for queries, on • Self-reported

0.8%

FactScore

Mode thinking for evaluation actual accuracy. level on • Self-reported

1.0%

License & Metadata

License

proprietary

Announcement Date

August 7, 2025

Last Updated

July 24, 2025

Similar Models

All Models

o1-pro

OpenAI

Best score:0.8 (GPQA)

Released:Dec 2024

GPT-4o

OpenAI

Best score:0.9 (HumanEval)

Released:May 2024

Price:$2.50/1M tokens

GPT-5.1 Instant

OpenAI

Best score:1.0 (TAU)

Released:Nov 2025

Price:$0.30/1M tokens

GPT-5.4 mini

OpenAI

Best score:0.9 (TAU)

Released:Mar 2026

Price:$0.42/1M tokens

GPT-5.4 nano

OpenAI

Best score:0.9 (TAU)

Released:Mar 2026

Price:$0.12/1M tokens

GPT-5 mini

OpenAI

Best score:0.8 (GPQA)

Released:Aug 2025

Price:$0.25/1M tokens

GPT-5.1 High

OpenAI

Best score:0.9 (GPQA)

Released:Nov 2025

Price:$2.00/1M tokens

GPT-5 High

OpenAI

Best score:0.9 (GPQA)

Released:Aug 2025

Price:$2.00/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.