Claude Sonnet 4
MultimodalClaude Sonnet 4, part of the Claude 4 family, is a major upgrade over Claude Sonnet 3.7. The model excels at coding (72.7% on SWE-bench) and logical reasoning, responding more precisely to instructions. Sonnet 4 offers an optimal balance of capability and practicality with improved steerability and supports extended thinking with tool use.
Key Specifications
Parameters
-
Context
200.0K
Release Date
May 22, 2025
Average Score
69.4%
Timeline
Key dates in the model's history
Announcement
May 22, 2025
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$3.00
Output (per 1M tokens)
$15.00
Max Input Tokens
200.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
Programming
Programming skills tests
SWE-Bench Verified
computation in time testing (attempts, choice with help model evaluation). Without thinking. On basis 5 and methodology SWE-bench for computations • Self-reported
Reasoning
Logical reasoning and analysis
GPQA
Diamond: Extended thinking (to 64 tokens) with in time testing (attempts, choice with help model evaluation). On basis 5 and to blog • Self-reported
Multimodal
Working with images and visual data
MMMU
Extended thinking (to 64K tokens). On basis to blog • Self-reported
Other Tests
Specialized benchmarks
AIME 2025
Extended thinking (to 64 tokens) with in time testing (several attempts, choice with help model evaluation). by (top_p 0,95). Based on 4, 5 and appendix to blog • Self-reported
MMMLU
Extended thinking (to 64 thousand tokens). Average value by 14 On basis to blog and 3 • Self-reported
TAU-bench Airline
Extended thinking with tool use (to 64K tokens, addition to prompt, increased maximum number steps). Based on appendix to blog and methodology TAU-bench. • Self-reported
TAU-bench Retail
Extended thinking with tool use (to 64K tokens, addition to prompt, increased maximum number steps). Based on appendix to blog and methodology TAU-bench. • Self-reported
Terminal-bench
computation in time testing (attempts, internal model by evaluation). Without thinking. Claude Code in capacity On basis 2 and 5 • Self-reported
License & Metadata
License
proprietary
Announcement Date
May 22, 2025
Last Updated
July 19, 2025
Similar Models
All ModelsClaude 3 Haiku
Anthropic
MM
Best score:0.9 (ARC)
Released:Mar 2024
Price:$0.25/1M tokens
Claude Opus 4
Anthropic
MM
Best score:0.8 (GPQA)
Released:May 2025
Price:$15.00/1M tokens
Claude 3.7 Sonnet
Anthropic
MM
Best score:0.8 (GPQA)
Released:Feb 2025
Price:$3.00/1M tokens
Claude 3 Sonnet
Anthropic
MM
Best score:0.9 (ARC)
Released:Feb 2024
Price:$3.00/1M tokens
Claude 3.5 Sonnet
Anthropic
MM
Best score:0.9 (HumanEval)
Released:Oct 2024
Price:$3.00/1M tokens
Claude Sonnet 4.6
Anthropic
MM
Best score:0.9 (GPQA)
Released:Feb 2026
Price:$3.00/1M tokens
Claude Haiku 4.5
Anthropic
MM
Best score:0.8 (TAU)
Released:Oct 2025
Price:$1.00/1M tokens
Claude Opus 4.5
Anthropic
MM
Best score:0.9 (TAU)
Released:Nov 2025
Price:$5.00/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.