Claude 3.7 Sonnet

Name: Claude 3.7 Sonnet
Author: Anthropic

Multimodal

Anthropic

The most intelligent Claude model and the first hybrid reasoning model on the market. Claude 3.7 Sonnet can provide near-instant responses or extended step-by-step thinking that is visible to the user. It demonstrates particularly significant improvements in coding and frontend web development.

Key Specifications

Parameters

Context

200.0K

Release Date

February 24, 2025

Average Score

74.1%

Results Blog

Timeline

Key dates in the model's history

Announcement

February 24, 2025

Last Update

July 19, 2025

Today

July 7, 2026

Technical Specifications

Parameters

Training Tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$3.00

Output (per 1M tokens)

$15.00

Max Input Tokens

200.0K

Max Output Tokens

128.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests

SWE-Bench Verified

With several and This approach more simple method with several for improvements accuracy. In this method LLM question and makes several attempts solve its with and components: • solution: model makes 5-10 independent attempts solve task • : model uses structured approach, : - tasks on subtasks - solutions - step-by-step solution - and errors • answers: model all attempts, and most solution Advantages: accuracy through thinking and especially efficient for complex tasks, requiring multi-step reasoning. Disadvantages: significantly tokens, more time, can be for simple questions • Self-reported

70.3%

Reasoning

Logical reasoning and analysis

GPQA

We we present method Diamond — approach to on reasoning model. Method Diamond on evaluation model by two : - How quality reasoning model with thinking (additional iterations ) - How this improvement in dependency from complexity tasks For evaluation Diamond we: 1. instruction, which encourages model thoroughly think over and then give answer. 2. model on set tasks, allowing it generate one justification and answer. 3. Then this indeed model analysis and reasoning on that indeed task, and provide answer. 4. process several times for one and that indeed tasks. 5. accuracy model on each thinking. Diamond accuracy model by two : - X: tasks (models, which can correctly solve task) - Y: Accuracy model at iterations thinking This allows us how model with thinking, and how this improvement depends from base complexity tasks • Self-reported

84.8%

Multimodal

Working with images and visual data

MMMU

Standard evaluation • Self-reported

75.0%

Other Tests

Specialized benchmarks

AIME 2024

• Self-reported

80.0%

AIME 2025

computation in time testing (4, 5) • Self-reported

54.8%

IFEval

• Self-reported

93.2%

MATH-500

• Self-reported

96.2%

MMMLU

Average value by 14 (3) • Self-reported

86.1%

TAU-bench Airline

With to prompt for best use AI: Results show, that simple addition to prompt ("Please, its answer before that, how its ") significantly improves performance model on mathematical In this strict GPT-4 Turbo without to prompt 20% on tasks AIME, in then time how with performance to 29.5% - in 47.5% by comparison with base This approach shows, that even without complex output or simply query to model about answer can substantially capabilities reasoning • Self-reported

58.4%

TAU-bench Retail

With in prompt for more use AI: analysis approach to solving problems about its approach to solving this tasks, I I can better evaluate, which process better total use for obtaining exact answer. 1. and tasks - I task, in order to understand, that specifically I all and main components tasks - I key limitations or 2. solutions - I several various approaches to solving - I most method solutions, which matches type tasks - I sequence steps, for obtaining answer 3. and solve task - I I will follow each step sequentially - I I will for their on each If I with I its approach and 4. solution and answer - I matches whether answer tasks - I that computation and I its answer, each step reasoning 5. I clearly final answer - I key steps, which to solving - I or approaches this structured approach to solving, I to task • Self-reported

81.2%

Terminal-bench

computation in time testing, Claude Code (2, 5) • Self-reported

35.2%

License & Metadata

License

proprietary

Announcement Date

February 24, 2025

Last Updated

July 19, 2025

Similar Models

All Models

Claude 3.5 Sonnet

Anthropic

Best score:0.9 (HumanEval)

Released:Oct 2024

Price:$3.00/1M tokens

Claude 3 Haiku

Anthropic

Best score:0.9 (ARC)

Released:Mar 2024

Price:$0.25/1M tokens

Claude Sonnet 4

Anthropic

Best score:0.8 (GPQA)

Released:May 2025

Price:$3.00/1M tokens

Claude Opus 4

Anthropic

Best score:0.8 (GPQA)

Released:May 2025

Price:$15.00/1M tokens

Claude 3 Sonnet

Anthropic

Best score:0.9 (ARC)

Released:Feb 2024

Price:$3.00/1M tokens

Claude Haiku 4.5

Anthropic

Best score:0.8 (TAU)

Released:Oct 2025

Price:$1.00/1M tokens

Claude Opus 4.6

Anthropic

Best score:1.0 (TAU)

Released:Feb 2026

Price:$5.00/1M tokens

Claude Sonnet 4.5

Anthropic

Best score:0.9 (TAU)

Released:Sep 2025

Price:$3.00/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.