Claude Opus 4
MultimodalClaude Opus 4 is Anthropic's most powerful model and the world's best coding model from the Claude 4 family. It delivers sustained performance on complex long-running tasks and agentic workflows. Opus 4 excels at coding, advanced reasoning, and can use tools (such as web search) during extended thinking. It supports parallel tool execution and features improved memory capabilities.
Key Specifications
Parameters
-
Context
200.0K
Release Date
May 22, 2025
Average Score
64.6%
Timeline
Key dates in the model's history
Announcement
May 22, 2025
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$15.00
Output (per 1M tokens)
$75.00
Max Input Tokens
200.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
Programming
Programming skills tests
SWE-Bench Verified
computation in time testing (several attempts, choice with help model evaluation). Without thinking. On basis 5 and methodology SWE-bench for computations • Self-reported
Reasoning
Logical reasoning and analysis
GPQA
Diamond: Extended thinking (to 64 thousand tokens) with in time testing (attempts, choice on basis model evaluation). Based on 5 and appendix to blog • Self-reported
Other Tests
Specialized benchmarks
AIME 2025
Extended thinking (to 64 thousand tokens) with in time testing (several attempts, internal choice model on basis evaluations). by (top_p 0,95). On basis 4, 5 and to blog • Self-reported
ARC-AGI v2
accuracy • Verified
MMMLU
Extended thinking (to 64K tokens). Average value by 14 On basis to blog and 3 • Self-reported
MMMU (validation)
Extended thinking (to 64K tokens). On basis to blog • Self-reported
TAU-bench Airline
Extended thinking with tool use (to 64K tokens, addition to prompt, increased maximum number steps). Based on appendix to blog and methodology TAU-bench. • Self-reported
TAU-bench Retail
Extended thinking with tool use (to 64K tokens, addition to prompt, increased maximum number steps). Based on appendix to blog and methodology TAU-bench. • Self-reported
Terminal-bench
computation at testing (attempts, choice with help model evaluation). Without mode thinking. Claude Code in capacity agentic On basis 2 and 5 • Self-reported
License & Metadata
License
proprietary
Announcement Date
May 22, 2025
Last Updated
July 19, 2025
Similar Models
All ModelsClaude Sonnet 4
Anthropic
MM
Best score:0.8 (GPQA)
Released:May 2025
Price:$3.00/1M tokens
Claude 3 Haiku
Anthropic
MM
Best score:0.9 (ARC)
Released:Mar 2024
Price:$0.25/1M tokens
Claude 3.7 Sonnet
Anthropic
MM
Best score:0.8 (GPQA)
Released:Feb 2025
Price:$3.00/1M tokens
Claude 3 Sonnet
Anthropic
MM
Best score:0.9 (ARC)
Released:Feb 2024
Price:$3.00/1M tokens
Claude 3.5 Sonnet
Anthropic
MM
Best score:0.9 (HumanEval)
Released:Oct 2024
Price:$3.00/1M tokens
Claude Opus 4.6
Anthropic
MM
Best score:1.0 (TAU)
Released:Feb 2026
Price:$5.00/1M tokens
Claude Sonnet 4.6
Anthropic
MM
Best score:0.9 (GPQA)
Released:Feb 2026
Price:$3.00/1M tokens
Claude Opus 4.1
Anthropic
MM
Best score:0.8 (TAU)
Released:Aug 2025
Price:$15.00/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.