Key Specifications
Parameters
-
Context
200.0K
Release Date
April 16, 2025
Average Score
66.5%
Timeline
Key dates in the model's history
Announcement
April 16, 2025
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
-
Training Tokens
-
Knowledge Cutoff
May 31, 2024
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$1.10
Output (per 1M tokens)
$4.40
Max Input Tokens
200.0K
Max Output Tokens
100.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
Programming
Programming skills tests
SWE-Bench Verified
accuracy • Self-reported
Reasoning
Logical reasoning and analysis
GPQA
accuracy (without tools) • Self-reported
Multimodal
Working with images and visual data
MathVista
Accuracy • Self-reported
MMMU
accuracy • Self-reported
Other Tests
Specialized benchmarks
Aider-Polyglot
accuracy (all sample, o4-mini-high) • Self-reported
Aider-Polyglot Edit
Accuracy (diff, o4-mini-high) • Self-reported
AIME 2024
Accuracy (without tools) • Self-reported
AIME 2025
accuracy (without tools) • Self-reported
BrowseComp
Accuracy (with Python + in ) AI: I I will how can answer on questions at testing, using Python (if necessary) and search in (if necessary) for improvement accuracy. At is access to: - Python for computations, analysis data and mathematical tasks - in for obtaining information I I will: 1. Python for analysis and solutions tasks, requiring programming 2. search in for search facts, and information 3. when I tools 4. exact, answers with 5. code, output and in answers I not I will: 1. if not 2. information 3. "accuracy" (that I when this not so) 4. search or code, when I I can answer without them goal — accuracy at each answer • Self-reported
CharXiv-R
accuracy • Self-reported
Humanity's Last Exam
accuracy (without tools) • Self-reported
Scale MultiChallenge
accuracy • Self-reported
TAU-bench Airline
accuracy (o4-mini-high) • Self-reported
TAU-bench Retail
Accuracy (o4-mini-high) • Self-reported
License & Metadata
License
proprietary
Announcement Date
April 16, 2025
Last Updated
July 19, 2025
Similar Models
All ModelsGPT-4o
OpenAI
MM
Best score:0.9 (MMLU)
Released:Aug 2024
Price:$2.50/1M tokens
GPT-4.1 mini
OpenAI
MM
Best score:0.9 (MMLU)
Released:Apr 2025
Price:$0.40/1M tokens
GPT-4.1
OpenAI
MM
Best score:0.9 (MMLU)
Released:Apr 2025
Price:$2.00/1M tokens
GPT-4o mini
OpenAI
MM
Best score:0.9 (HumanEval)
Released:Jul 2024
Price:$0.15/1M tokens
GPT-4.5
OpenAI
MM
Best score:0.9 (MMLU)
Released:Feb 2025
Price:$75.00/1M tokens
GPT-5 nano
OpenAI
MM
Best score:0.7 (GPQA)
Released:Aug 2025
Price:$0.05/1M tokens
GPT-4
OpenAI
MM
Best score:1.0 (ARC)
Released:Jun 2023
Price:$30.00/1M tokens
GPT-5.3 Codex
OpenAI
MM
Released:Feb 2026
Price:$1.75/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.