Anthropic logo

Claude Opus 4.1

Multimodal
Anthropic

Claude Opus 4.1 is a hybrid reasoning model that pushes the boundaries in coding and AI agents, equipped with a 200K token context window. It delivers superior performance and accuracy for real-world coding tasks and agentic applications, handling complex multi-step problems with thoroughness and attention to detail. With extended thinking capabilities, the model offers instant responses or detailed step-by-step reasoning visible through user-friendly summaries. It advances state-of-the-art coding performance to 74.5% on SWE-bench Verified, excels at agentic search and research, and produces human-quality content with exceptional writing abilities. The model supports 32K output tokens and adapts to specific coding styles, delivering exceptional quality for large-scale code generation and refactoring projects.

Key Specifications

Parameters
-
Context
200.0K
Release Date
August 5, 2025
Average Score
72.7%

Timeline

Key dates in the model's history
Announcement / Last Update
August 5, 2025
Today
March 26, 2026

Technical Specifications

Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$15.00
Output (per 1M tokens)
$75.00
Max Input Tokens
200.0K
Max Output Tokens
32.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests
SWE-Bench Verified
Without structure with using tool bash and tool through Results presented from 500 tasksSelf-reported
74.5%

Reasoning

Logical reasoning and analysis
GPQA
Diamond: Extended thinking (to 64 thousand tokens)Self-reported
80.9%

Other Tests

Specialized benchmarks
Terminal-bench
Without thinking. Terminus 1 by 5 trialsSelf-reported
43.3%
TAU-bench Retail
Extended thinking with tool use (to 64K tokens, addition to prompt, number steps with 30 to 100)Self-reported
82.4%
TAU-bench Airline
Extended thinking with tool use (to 64K tokens, addition to query, increased maximum number steps with 30 to 100).Self-reported
56.0%
MMMLU
Extended thinking (to 64K tokens). Average value by 14Self-reported
89.5%
MMMU (validation)
Extended thinking (to 64K tokens) AI: GPT-4: I want you to solve the hardest, most complex question in this document. If there are multiple questions, pick the hardest one. Let's explore this step-by-step. First, I need to read through the document to identify all questions and assess their difficulty. Then I'll select the most challenging one and work through it thoroughly. For complex problems, I'll: 1. Break down the problem into smaller parts 2. Identify key concepts and formulas needed 3. Work methodically through each step 4. Double-check my work 5. Ensure my final answer is clearly explained Let me begin by examining the document... [Model continues thinking through the problem in detail, exploring multiple approaches, checking for errors, and refining the solution over many steps]Self-reported
77.1%
AIME 2025
Extended thinking (to 64K tokens). AIME 2025 with using sample by with top_p 0.95Self-reported
78.0%

License & Metadata

License
proprietary
Announcement Date
August 5, 2025
Last Updated
August 5, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.