Anthropic logo

Claude 3.7 Sonnet

Multimodal
Anthropic

The most intelligent Claude model and the first hybrid reasoning model on the market. Claude 3.7 Sonnet can provide near-instant responses or extended step-by-step thinking that is visible to the user. It demonstrates particularly significant improvements in coding and frontend web development.

Key Specifications

Parameters
-
Context
200.0K
Release Date
February 24, 2025
Average Score
74.1%

Timeline

Key dates in the model's history
Announcement
February 24, 2025
Last Update
July 19, 2025
Today
March 26, 2026

Technical Specifications

Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$3.00
Output (per 1M tokens)
$15.00
Max Input Tokens
200.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests
SWE-Bench Verified
With several and This approach more simple method with several for improvements accuracy. In this method LLM question and makes several attempts solve its with and components: • solution: model makes 5-10 independent attempts solve task • : model uses structured approach, : - tasks on subtasks - solutions - step-by-step solution - and errors • answers: model all attempts, and most solution Advantages: accuracy through thinking and especially efficient for complex tasks, requiring multi-step reasoning. Disadvantages: significantly tokens, more time, can be for simple questionsSelf-reported
70.3%

Reasoning

Logical reasoning and analysis
GPQA
We we present method Diamond — approach to on reasoning model. Method Diamond on evaluation model by two : - How quality reasoning model with thinking (additional iterations ) - How this improvement in dependency from complexity tasks For evaluation Diamond we: 1. instruction, which encourages model thoroughly think over and then give answer. 2. model on set tasks, allowing it generate one justification and answer. 3. Then this indeed model analysis and reasoning on that indeed task, and provide answer. 4. process several times for one and that indeed tasks. 5. accuracy model on each thinking. Diamond accuracy model by two : - X: tasks (models, which can correctly solve task) - Y: Accuracy model at iterations thinking This allows us how model with thinking, and how this improvement depends from base complexity tasksSelf-reported
84.8%

Multimodal

Working with images and visual data
MMMU
Standard evaluationSelf-reported
75.0%

Other Tests

Specialized benchmarks
AIME 2024
Self-reported
80.0%
AIME 2025
computation in time testing (4, 5)Self-reported
54.8%
IFEval
Self-reported
93.2%
MATH-500
Self-reported
96.2%
MMMLU
Average value by 14 (3)Self-reported
86.1%
TAU-bench Airline
With to prompt for best use AI: Results show, that simple addition to prompt ("Please, its answer before that, how its ") significantly improves performance model on mathematical In this strict GPT-4 Turbo without to prompt 20% on tasks AIME, in then time how with performance to 29.5% - in 47.5% by comparison with base This approach shows, that even without complex output or simply query to model about answer can substantially capabilities reasoningSelf-reported
58.4%
TAU-bench Retail
With in prompt for more use AI: analysis approach to solving problems about its approach to solving this tasks, I I can better evaluate, which process better total use for obtaining exact answer. 1. and tasks - I task, in order to understand, that specifically I all and main components tasks - I key limitations or 2. solutions - I several various approaches to solving - I most method solutions, which matches type tasks - I sequence steps, for obtaining answer 3. and solve task - I I will follow each step sequentially - I I will for their on each If I with I its approach and 4. solution and answer - I matches whether answer tasks - I that computation and I its answer, each step reasoning 5. I clearly final answer - I key steps, which to solving - I or approaches this structured approach to solving, I to taskSelf-reported
81.2%
Terminal-bench
computation in time testing, Claude Code (2, 5)Self-reported
35.2%

License & Metadata

License
proprietary
Announcement Date
February 24, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.