Claude 3.5 Sonnet

Name: Claude 3.5 Sonnet
Author: Anthropic

Multimodal

Anthropic

Claude 3.5 Sonnet is a powerful AI model with industry-leading software development skills. It excels at coding, planning, and problem-solving, demonstrating significant improvements in agentic coding and tool use. The model includes computer use capabilities in public beta, enabling it to interact with computer interfaces like a human user.

Key Specifications

Parameters

Context

200.0K

Release Date

October 22, 2024

Average Score

73.3%

Research Paper Results Blog

Timeline

Key dates in the model's history

Announcement

October 22, 2024

Last Update

July 19, 2025

Today

May 9, 2026

Technical Specifications

Parameters

Training Tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$3.00

Output (per 1M tokens)

$15.00

Max Input Tokens

200.0K

Max Output Tokens

200.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

MMLU

5-shot CoT AI: I I will use 5-shot Chain-of-Thought (5-shot CoT) for training model solving tasks with help chains reasoning. I 5 examples solutions tasks, step-by-step reasoning. How this works: 1. I 5 tasks, on problem, with solution for each 2. Each example contains task and reasoning, to answer 3. After demonstration these examples I model new task 4. Model should apply process reasoning to new task Advantages: - demonstrates process reasoning, and not only answers - model "aloud" at solving problems - More effectively, than simple "solve step by step" - Allows model understand structure solutions 5-shot CoT especially efficient for mathematical tasks, logical puzzles and tasks, requiring reasoning. several examples with detailed reasoning, I model process, for solutions similar tasks • Self-reported

90.4%

Programming

Programming skills tests

HumanEval

0-shot AI: Zero-Shot In case zero-shot tasks model without any-or examples or prompts. Model should execute task, only on its knowledge, in time preliminary training. For example, at solving mathematical tasks model receives only task, but not examples solutions similar tasks. : basic knowledge and abilities model without additional training. : Performance usually than at few-shot approach • Self-reported

93.7%

SWE-Bench Verified

Standard AI: Translate on Russian language following text method analysis. ONLY translation, without quotes, without without explanations • Self-reported

49.0%

Mathematics

Mathematical problems and computations

GSM8k

0-shot CoT When use 0-shot CoT (reasoning by chain) model think step for step about that, how solve problem, even not examples such reasoning. This prompts, such how "Let's let's think about this step for step" to query. Research showed, that such prompt significantly improves ability language models solve tasks reasoning by comparison with direct answer on question, not to reasoning. Although 0-shot CoT few-shot CoT, where model are provided samples step-by-step reasoning, method all substantially performance without necessity in additional examples. This method especially efficient for more large language models, which already ability to reasoning, but can not apply this ability without make this • Self-reported

96.4%

MATH

Standard • Self-reported

78.3%

MGSM

0-shot CoT Method "chain thinking" without preliminary examples (0-shot Chain-of-Thought) represents itself approach, at which model solves task, her/its on sequential steps reasoning, not at this access to examples such reasoning in advance. In this approach model generate intermediate reasoning, which to answer, but makes this without demonstration examples that, how should look chain thinking. Usually method by means of in query phrases "Let's let's think step for step" or prompts, which model on reasoning. This encourages model perform step-by-step reasoning and its that often leads to more exact results by comparison with answer • Self-reported

91.6%

Reasoning

Logical reasoning and analysis

BIG-Bench Hard

3-shot CoT In given approach we standard method Chain-of-Thought (CoT), providing model several examples (usually three) with reasoning for solutions tasks. Such approach "few-shot CoT", where "few-shot" number examples, and "CoT" indicates on chains reasoning. When model receives new task, she/it can on these examples, in order to process reasoning in common option includes three example, therefore we its "3-shot CoT". method 3-shot CoT in that, that he not requires complex instructions or query - sufficiently simply provide examples solutions. This especially useful for mathematical and logical tasks, where step-by-step reasoning critically important for obtaining correct answer • Self-reported

93.1%

DROP

3-shot F1 Score AI: 3-shot F1 Score • Self-reported

87.1%

GPQA

Maj@32 5-shot CoT This method for improvement performance models at solving tasks logical output and decision-making solutions. He combines several approaches: 1. **Chain reasoning (Chain-of-Thought)**: Model solution complex tasks on sequence intermediate steps, process thinking. 2. **Few-shot examples**: Models is provided several (in given case 5) examples with correct reasoning and answers, that helps it better understand format solutions. 3. **(Majority voting)**: Model generates set independent solutions for one tasks (in given case 32), and then answer, which total. This approach significantly accuracy at solving complex tasks, since: - Chain reasoning process solutions - Few-shot examples model in format - errors in attempts Maj@32 5-shot CoT especially efficient for mathematical tasks, logical puzzles and tasks, requiring reasoning • Self-reported

67.2%

Multimodal

Working with images and visual data

AI2D

test • Self-reported

94.7%

ChartQA

test, accuracy • Self-reported

90.8%

DocVQA

test, evaluation ANLS • Self-reported

95.2%

MathVista

testmini • Self-reported

67.7%

MMMU

Standard evaluation • Self-reported

68.3%

Other Tests

Specialized benchmarks

MMLU-Pro

5-shot method (few-shot) prompting for training model task. We we provide model 5 examples solutions mathematical tasks with steps. This allows model understand format solutions and apply approach to new task without additional settings. In this method context includes 5 examples solutions, for which should task for solutions. Model should follow that indeed format and reasoning, that and in examples • Self-reported

77.6%

OSWorld Extended

In standard mode we we evaluate model in that form, in which she/it usually is used in real situations. Model receives prompt without instructions about that, how to solving tasks. This basic mode allows us measure performance model • Self-reported

22.0%

OSWorld Screenshot-only

Standard • Self-reported

14.9%

TAU-bench Airline

Standard In this approach model directly generates solutions to tasks without any-or instructions. This also for comparison at various methods prompts. In its we used following format prompts: ``` task: [task] Please, task step for step. ``` However for some tasks we format, in order to follow specific instructions, in data. For example, for tasks from GPQA we used format: ``` [task] ``` • Self-reported

46.0%

TAU-bench Retail

Standard AI: task, above • Self-reported

69.2%

License & Metadata

License

proprietary

Announcement Date

October 22, 2024

Last Updated

July 19, 2025

Similar Models

All Models

Claude 3 Haiku

Anthropic

Best score:0.9 (ARC)

Released:Mar 2024

Price:$0.25/1M tokens

Claude Sonnet 4

Anthropic

Best score:0.8 (GPQA)

Released:May 2025

Price:$3.00/1M tokens

Claude Opus 4

Anthropic

Best score:0.8 (GPQA)

Released:May 2025

Price:$15.00/1M tokens

Claude 3.5 Sonnet

Anthropic

Best score:0.9 (HumanEval)

Released:Jun 2024

Price:$3.00/1M tokens

Claude 3 Opus

Anthropic

Best score:1.0 (ARC)

Released:Feb 2024

Price:$15.00/1M tokens

Claude 3.7 Sonnet

Anthropic

Best score:0.8 (GPQA)

Released:Feb 2025

Price:$3.00/1M tokens

Claude 3 Sonnet

Anthropic

Best score:0.9 (ARC)

Released:Feb 2024

Price:$3.00/1M tokens

Claude Opus 4.1

Anthropic

Best score:0.8 (TAU)

Released:Aug 2025

Price:$15.00/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.