Claude Sonnet 4

Name: Claude Sonnet 4
Author: Anthropic

Multimodal

Anthropic

Claude Sonnet 4, part of the Claude 4 family, is a major upgrade over Claude Sonnet 3.7. The model excels at coding (72.7% on SWE-bench) and logical reasoning, responding more precisely to instructions. Sonnet 4 offers an optimal balance of capability and practicality with improved steerability and supports extended thinking with tool use.

Key Specifications

Parameters

Context

200.0K

Release Date

May 22, 2025

Average Score

69.4%

Results Blog

Timeline

Key dates in the model's history

Announcement

May 22, 2025

Last Update

July 19, 2025

Today

May 10, 2026

Technical Specifications

Parameters

Training Tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$3.00

Output (per 1M tokens)

$15.00

Max Input Tokens

200.0K

Max Output Tokens

128.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests

SWE-Bench Verified

computation in time testing (attempts, choice with help model evaluation). Without thinking. On basis 5 and methodology SWE-bench for computations • Self-reported

72.7%

Reasoning

Logical reasoning and analysis

GPQA

Diamond: Extended thinking (to 64 tokens) with in time testing (attempts, choice with help model evaluation). On basis 5 and to blog • Self-reported

75.4%

Multimodal

Working with images and visual data

MMMU

Extended thinking (to 64K tokens). On basis to blog • Self-reported

74.4%

Other Tests

Specialized benchmarks

AIME 2025

Extended thinking (to 64 tokens) with in time testing (several attempts, choice with help model evaluation). by (top_p 0,95). Based on 4, 5 and appendix to blog • Self-reported

70.5%

MMMLU

Extended thinking (to 64 thousand tokens). Average value by 14 On basis to blog and 3 • Self-reported

86.5%

TAU-bench Airline

Extended thinking with tool use (to 64K tokens, addition to prompt, increased maximum number steps). Based on appendix to blog and methodology TAU-bench. • Self-reported

60.0%

TAU-bench Retail

Extended thinking with tool use (to 64K tokens, addition to prompt, increased maximum number steps). Based on appendix to blog and methodology TAU-bench. • Self-reported

80.5%

Terminal-bench

computation in time testing (attempts, internal model by evaluation). Without thinking. Claude Code in capacity On basis 2 and 5 • Self-reported

35.5%

License & Metadata

License

proprietary

Announcement Date

May 22, 2025

Last Updated

July 19, 2025

Similar Models

All Models

Claude 3 Haiku

Anthropic

Best score:0.9 (ARC)

Released:Mar 2024

Price:$0.25/1M tokens

Claude Opus 4

Anthropic

Best score:0.8 (GPQA)

Released:May 2025

Price:$15.00/1M tokens

Claude 3.7 Sonnet

Anthropic

Best score:0.8 (GPQA)

Released:Feb 2025

Price:$3.00/1M tokens

Claude 3 Sonnet

Anthropic

Best score:0.9 (ARC)

Released:Feb 2024

Price:$3.00/1M tokens

Claude 3.5 Sonnet

Anthropic

Best score:0.9 (HumanEval)

Released:Oct 2024

Price:$3.00/1M tokens

Claude Sonnet 4.6

Anthropic

Best score:0.9 (GPQA)

Released:Feb 2026

Price:$3.00/1M tokens

Claude Haiku 4.5

Anthropic

Best score:0.8 (TAU)

Released:Oct 2025

Price:$1.00/1M tokens

Claude Opus 4.5

Anthropic

Best score:0.9 (TAU)

Released:Nov 2025

Price:$5.00/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.