Anthropic logo

Claude Opus 4

Multimodal
Anthropic

Claude Opus 4 is Anthropic's most powerful model and the world's best coding model from the Claude 4 family. It delivers sustained performance on complex long-running tasks and agentic workflows. Opus 4 excels at coding, advanced reasoning, and can use tools (such as web search) during extended thinking. It supports parallel tool execution and features improved memory capabilities.

Key Specifications

Parameters
-
Context
200.0K
Release Date
May 22, 2025
Average Score
64.6%

Timeline

Key dates in the model's history
Announcement
May 22, 2025
Last Update
July 19, 2025
Today
March 26, 2026

Technical Specifications

Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$15.00
Output (per 1M tokens)
$75.00
Max Input Tokens
200.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests
SWE-Bench Verified
computation in time testing (several attempts, choice with help model evaluation). Without thinking. On basis 5 and methodology SWE-bench for computationsSelf-reported
72.5%

Reasoning

Logical reasoning and analysis
GPQA
Diamond: Extended thinking (to 64 thousand tokens) with in time testing (attempts, choice on basis model evaluation). Based on 5 and appendix to blogSelf-reported
79.6%

Other Tests

Specialized benchmarks
AIME 2025
Extended thinking (to 64 thousand tokens) with in time testing (several attempts, internal choice model on basis evaluations). by (top_p 0,95). On basis 4, 5 and to blogSelf-reported
75.5%
ARC-AGI v2
accuracyVerified
8.6%
MMMLU
Extended thinking (to 64K tokens). Average value by 14 On basis to blog and 3Self-reported
88.8%
MMMU (validation)
Extended thinking (to 64K tokens). On basis to blogSelf-reported
76.5%
TAU-bench Airline
Extended thinking with tool use (to 64K tokens, addition to prompt, increased maximum number steps). Based on appendix to blog and methodology TAU-bench.Self-reported
59.6%
TAU-bench Retail
Extended thinking with tool use (to 64K tokens, addition to prompt, increased maximum number steps). Based on appendix to blog and methodology TAU-bench.Self-reported
81.4%
Terminal-bench
computation at testing (attempts, choice with help model evaluation). Without mode thinking. Claude Code in capacity agentic On basis 2 and 5Self-reported
39.2%

License & Metadata

License
proprietary
Announcement Date
May 22, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.