Claude Opus 4

Name: Claude Opus 4
Author: Anthropic

Multimodal

Anthropic

Claude Opus 4 is Anthropic's most powerful model and the world's best coding model from the Claude 4 family. It delivers sustained performance on complex long-running tasks and agentic workflows. Opus 4 excels at coding, advanced reasoning, and can use tools (such as web search) during extended thinking. It supports parallel tool execution and features improved memory capabilities.

Key Specifications

Parameters

Context

200.0K

Release Date

May 22, 2025

Average Score

64.6%

Results Blog

Timeline

Key dates in the model's history

Announcement

May 22, 2025

Last Update

July 19, 2025

Today

July 7, 2026

Technical Specifications

Parameters

Training Tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$15.00

Output (per 1M tokens)

$75.00

Max Input Tokens

200.0K

Max Output Tokens

128.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests

SWE-Bench Verified

computation in time testing (several attempts, choice with help model evaluation). Without thinking. On basis 5 and methodology SWE-bench for computations • Self-reported

72.5%

Reasoning

Logical reasoning and analysis

GPQA

Diamond: Extended thinking (to 64 thousand tokens) with in time testing (attempts, choice on basis model evaluation). Based on 5 and appendix to blog • Self-reported

79.6%

Other Tests

Specialized benchmarks

AIME 2025

Extended thinking (to 64 thousand tokens) with in time testing (several attempts, internal choice model on basis evaluations). by (top_p 0,95). On basis 4, 5 and to blog • Self-reported

75.5%

ARC-AGI v2

accuracy • Verified

8.6%

MMMLU

Extended thinking (to 64K tokens). Average value by 14 On basis to blog and 3 • Self-reported

88.8%

MMMU (validation)

Extended thinking (to 64K tokens). On basis to blog • Self-reported

76.5%

TAU-bench Airline

Extended thinking with tool use (to 64K tokens, addition to prompt, increased maximum number steps). Based on appendix to blog and methodology TAU-bench. • Self-reported

59.6%

TAU-bench Retail

Extended thinking with tool use (to 64K tokens, addition to prompt, increased maximum number steps). Based on appendix to blog and methodology TAU-bench. • Self-reported

81.4%

Terminal-bench

computation at testing (attempts, choice with help model evaluation). Without mode thinking. Claude Code in capacity agentic On basis 2 and 5 • Self-reported

39.2%

License & Metadata

License

proprietary

Announcement Date

May 22, 2025

Last Updated

July 19, 2025

Similar Models

All Models

Claude Sonnet 4

Anthropic

Best score:0.8 (GPQA)

Released:May 2025

Price:$3.00/1M tokens

Claude 3 Haiku

Anthropic

Best score:0.9 (ARC)

Released:Mar 2024

Price:$0.25/1M tokens

Claude 3.7 Sonnet

Anthropic

Best score:0.8 (GPQA)

Released:Feb 2025

Price:$3.00/1M tokens

Claude 3 Sonnet

Anthropic

Best score:0.9 (ARC)

Released:Feb 2024

Price:$3.00/1M tokens

Claude 3.5 Sonnet

Anthropic

Best score:0.9 (HumanEval)

Released:Oct 2024

Price:$3.00/1M tokens

Claude Opus 4.6

Anthropic

Best score:1.0 (TAU)

Released:Feb 2026

Price:$5.00/1M tokens

Claude Sonnet 4.6

Anthropic

Best score:0.9 (GPQA)

Released:Feb 2026

Price:$3.00/1M tokens

Claude Opus 4.1

Anthropic

Best score:0.8 (TAU)

Released:Aug 2025

Price:$15.00/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.