Anthropic logo

Claude Sonnet 4

Multimodal
Anthropic

Claude Sonnet 4, part of the Claude 4 family, is a major upgrade over Claude Sonnet 3.7. The model excels at coding (72.7% on SWE-bench) and logical reasoning, responding more precisely to instructions. Sonnet 4 offers an optimal balance of capability and practicality with improved steerability and supports extended thinking with tool use.

Key Specifications

Parameters
-
Context
200.0K
Release Date
May 22, 2025
Average Score
69.4%

Timeline

Key dates in the model's history
Announcement
May 22, 2025
Last Update
July 19, 2025
Today
March 26, 2026

Technical Specifications

Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$3.00
Output (per 1M tokens)
$15.00
Max Input Tokens
200.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests
SWE-Bench Verified
computation in time testing (attempts, choice with help model evaluation). Without thinking. On basis 5 and methodology SWE-bench for computationsSelf-reported
72.7%

Reasoning

Logical reasoning and analysis
GPQA
Diamond: Extended thinking (to 64 tokens) with in time testing (attempts, choice with help model evaluation). On basis 5 and to blogSelf-reported
75.4%

Multimodal

Working with images and visual data
MMMU
Extended thinking (to 64K tokens). On basis to blogSelf-reported
74.4%

Other Tests

Specialized benchmarks
AIME 2025
Extended thinking (to 64 tokens) with in time testing (several attempts, choice with help model evaluation). by (top_p 0,95). Based on 4, 5 and appendix to blogSelf-reported
70.5%
MMMLU
Extended thinking (to 64 thousand tokens). Average value by 14 On basis to blog and 3Self-reported
86.5%
TAU-bench Airline
Extended thinking with tool use (to 64K tokens, addition to prompt, increased maximum number steps). Based on appendix to blog and methodology TAU-bench.Self-reported
60.0%
TAU-bench Retail
Extended thinking with tool use (to 64K tokens, addition to prompt, increased maximum number steps). Based on appendix to blog and methodology TAU-bench.Self-reported
80.5%
Terminal-bench
computation in time testing (attempts, internal model by evaluation). Without thinking. Claude Code in capacity On basis 2 and 5Self-reported
35.5%

License & Metadata

License
proprietary
Announcement Date
May 22, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.