Anthropic logo

Claude Opus 4.6

Multimodal
Anthropic

Claude Opus 4.6 is Anthropic's most intelligent model for building agents and coding. Significantly improved coding skills: more thorough planning, sustained agentic task support, reliable performance in large codebases, improved code review and debugging. Context window: 200K tokens by default, 1M tokens available in beta with premium pricing ($10/$37.50 per million input/output tokens at >200K). Up to 128K output tokens. New API features: adaptive thinking (the model decides when to use extended thinking), effort control (low/medium/high/max), context compression for long-running tasks. Leads on Terminal-Bench 2.0 (agentic coding), Humanity's Last Exam (multidisciplinary reasoning), GDPval-AA (knowledge work in finance, law), BrowseComp (information retrieval), DeepSearchQA (deep agentic search). Supports agent teams in Claude Code, Claude in Excel, and Claude in PowerPoint.

Key Specifications

Parameters
-
Context
1.0M
Release Date
February 4, 2026
Average Score
80.9%

Timeline

Key dates in the model's history
Announcement
February 4, 2026
Last Update
February 6, 2026
Today
March 25, 2026

Technical Specifications

Parameters
-
Training Tokens
-
Knowledge Cutoff
May 1, 2025
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$5.00
Output (per 1M tokens)
$25.00
Max Input Tokens
1.0M
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests
SWE-Bench Verified
SWE-Bench Verified — solution real tasks from GitHub issues.Self-reported
78.0%

Reasoning

Logical reasoning and analysis
GPQA
Accuracy GPQA Diamond.Self-reported
91.3%

Other Tests

Specialized benchmarks
Vending-Bench 2
Final in USD. Simulation vending business for year work. Starting $5,000Self-reported
100.0%
GDPval-AA
Elo rating. Independent evaluation Artificial Analysis. Outperforms GPT-5.2 on ~144 Elo and Claude Opus 4.5 on 190 points.Self-reported
53.5%
AIME 2025
Accuracy Consensus@64 (most often occurring answer among 64 samples). Independent evaluation Artificial Analysis.Self-reported
100.0%
TAU2 Telecom
tool use (τ2-bench Telecom)Self-reported
99.0%
Graphwalks Parents >128K
GraphWalks Parents 256K 1M. F1 with context 1M, average from 5 attemptsSelf-reported
95.0%
MRCR v2 (8-needle)
OpenAI MRCR v2 256K 8-needles. Mean Match Ratio with 1M, average from 5 attemptsSelf-reported
93.0%
Humanity's Last Exam
Accuracy on HLE benchmark.Self-reported
46.2%
BrowseComp
Accuracy BrowseComp — by for search complex informationSelf-reported
72.0%
ARC-AGI v2
ARC-AGI-2 — reasoning throughSelf-reported
68.8%
CharXiv-R
CharXiv-R — reasoning about scientific from arXivSelf-reported
74.0%

License & Metadata

License
proprietary
Announcement Date
February 4, 2026
Last Updated
February 6, 2026

Compare Claude Opus 4.6

All comparisons

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.