Key Specifications
Parameters
-
Context
400.0K
Release Date
August 7, 2025
Average Score
70.1%
Timeline
Key dates in the model's history
Announcement
August 7, 2025
Today
March 25, 2026
Technical Specifications
Parameters
-
Training Tokens
-
Knowledge Cutoff
September 30, 2024
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$1.25
Output (per 1M tokens)
$10.00
Max Input Tokens
400.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
Standard benchmark by various with evaluation knowledge • Self-reported
Programming
Programming skills tests
SWE-Bench Verified
Mode thinking (to 128K tokens) with capabilities reasoning and approach to solving problems • Self-reported
HumanEval
Benchmark generation code with by functions on Python • Self-reported
Mathematics
Mathematical problems and computations
MATH
mode thinking with solution and mathematical tasks • Self-reported
Reasoning
Logical reasoning and analysis
GPQA
GPT-5 - Diamond-thinking without tools This methodology uses mode thinking, which differs from more standard "and " thinking. consists in that, that we we use so "diamond-thinking" (thinking) - method, where model: 1. with on problem, set various approaches and possible 2. to most 3. these on thoroughly evaluating their 4. in full, solution This approach especially efficient for mathematical and tasks, requiring thinking for search solutions. During many cases model achieves exact, correct solutions, which not was would at use more methods thinking. Since we not we tool use, results can directly compare with where tools, in order to better understand their on performance model • Self-reported
Multimodal
Working with images and visual data
MMMU
GPT-5 with mode thinking - Solution visual tasks level with • Self-reported
Other Tests
Specialized benchmarks
Aider-Polyglot
mode thinking (to 128 thousand tokens) with reasoning and understanding code on different languages programming • Self-reported
SWE-Lancer (IC-Diamond subset)
GPT-5 - IC SWE Diamond Freelance Coding Tasks (evaluation on basis ) • Self-reported
AIME 2025
GPT-5 standard with included mode thinking (without tools) - mathematics • Self-reported
HealthBench Hard
Mode thinking for in complex on • Self-reported
FrontierMath
GPT-5 standard with included mode thinking (only with tool python) - mathematics expert level FrontierMath levels 1-3. • Self-reported
HMMT 2025
GPT-5 standard with included mode thinking (without tools) - Harvard-MIT Mathematics Tournament. • Self-reported
Humanity's Last Exam
GPT-5 standard with mode thinking (without tools) - set questions expert level by various subjects • Self-reported
Scale MultiChallenge
GPT-5 with included mode thinking - Benchmark execution multi-step instructions. • Self-reported
BrowseComp
GPT-5 with included mode thinking - Benchmark agentic search and • Self-reported
COLLIE
GPT-5 with included mode thinking - instructions in form • Self-reported
MultiChallenge (o3-mini grader)
GPT-5 with o3-mini - Benchmark execution instructions with accuracy evaluation • Self-reported
Internal API instruction following (hard)
GPT-5 - Evaluation execution instructions through internal API (complexity) • Self-reported
Tau2 airline
GPT-5 - Benchmark functions () • Self-reported
Tau2 retail
GPT-5 with mode thinking - Benchmark functions () • Self-reported
Tau2 telecom
GPT-5 with mode thinking - Benchmark functions (field) • Self-reported
MMMU-Pro
GPT-5 with mode thinking - solution visual tasks level with reasoning • Self-reported
VideoMMMU
GPT-5 with mode thinking - reasoning (256 ) • Self-reported
CharXiv-R
GPT-5 with mode thinking - Reasoning and scientific • Self-reported
ERQA
GPT-5 with mode thinking - thinking • Self-reported
OpenAI-MRCR: 2 needle 128k
OpenAI-MRCR - search at 128 tokens • Self-reported
OpenAI-MRCR: 2 needle 256k
OpenAI-MRCR 2-needle retrieval at 256 tokens • Self-reported
Graphwalks BFS <128k
in (Graphwalks BFS) (<128k) for reasoning with context • Self-reported
Graphwalks parents <128k
on (<128k), for reasoning with context • Self-reported
BrowseComp Long Context 128k
BrowseComp option with context 128k • Self-reported
BrowseComp Long Context 256k
BrowseComp option with context 256k • Self-reported
VideoMME w sub.
VideoMME (long) with — • Self-reported
LongFact-Concepts
mode thinking for on for queries • Self-reported
LongFact-Objects
Mode thinking for on for queries, on • Self-reported
FactScore
Mode thinking for evaluation actual accuracy. level on • Self-reported
License & Metadata
License
proprietary
Announcement Date
August 7, 2025
Last Updated
July 24, 2025
Similar Models
All Modelso1-pro
OpenAI
MM
Best score:0.8 (GPQA)
Released:Dec 2024
GPT-4o
OpenAI
MM
Best score:0.9 (HumanEval)
Released:May 2024
Price:$2.50/1M tokens
GPT-5.1 Instant
OpenAI
MM
Best score:1.0 (TAU)
Released:Nov 2025
Price:$0.30/1M tokens
GPT-5.4 mini
OpenAI
MM
Best score:0.9 (TAU)
Released:Mar 2026
Price:$0.42/1M tokens
GPT-5.4 nano
OpenAI
MM
Best score:0.9 (TAU)
Released:Mar 2026
Price:$0.12/1M tokens
GPT-5 mini
OpenAI
MM
Best score:0.8 (GPQA)
Released:Aug 2025
Price:$0.25/1M tokens
GPT-5.1 High
OpenAI
MM
Best score:0.9 (GPQA)
Released:Nov 2025
Price:$2.00/1M tokens
GPT-5 High
OpenAI
MM
Best score:0.9 (GPQA)
Released:Aug 2025
Price:$2.00/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.