Claude 3.5 Sonnet
MultimodalClaude 3.5 Sonnet is a powerful AI model with industry-leading software development skills. It excels at coding, planning, and problem-solving, demonstrating significant improvements in agentic coding and tool use. The model includes computer use capabilities in public beta, enabling it to interact with computer interfaces like a human user.
Key Specifications
Parameters
-
Context
200.0K
Release Date
October 22, 2024
Average Score
73.3%
Timeline
Key dates in the model's history
Announcement
October 22, 2024
Last Update
July 19, 2025
Today
March 26, 2026
Technical Specifications
Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$3.00
Output (per 1M tokens)
$15.00
Max Input Tokens
200.0K
Max Output Tokens
200.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
5-shot CoT AI: I I will use 5-shot Chain-of-Thought (5-shot CoT) for training model solving tasks with help chains reasoning. I 5 examples solutions tasks, step-by-step reasoning. How this works: 1. I 5 tasks, on problem, with solution for each 2. Each example contains task and reasoning, to answer 3. After demonstration these examples I model new task 4. Model should apply process reasoning to new task Advantages: - demonstrates process reasoning, and not only answers - model "aloud" at solving problems - More effectively, than simple "solve step by step" - Allows model understand structure solutions 5-shot CoT especially efficient for mathematical tasks, logical puzzles and tasks, requiring reasoning. several examples with detailed reasoning, I model process, for solutions similar tasks • Self-reported
Programming
Programming skills tests
HumanEval
0-shot AI: Zero-Shot In case zero-shot tasks model without any-or examples or prompts. Model should execute task, only on its knowledge, in time preliminary training. For example, at solving mathematical tasks model receives only task, but not examples solutions similar tasks. : basic knowledge and abilities model without additional training. : Performance usually than at few-shot approach • Self-reported
SWE-Bench Verified
Standard AI: Translate on Russian language following text method analysis. ONLY translation, without quotes, without without explanations • Self-reported
Mathematics
Mathematical problems and computations
GSM8k
0-shot CoT When use 0-shot CoT (reasoning by chain) model think step for step about that, how solve problem, even not examples such reasoning. This prompts, such how "Let's let's think about this step for step" to query. Research showed, that such prompt significantly improves ability language models solve tasks reasoning by comparison with direct answer on question, not to reasoning. Although 0-shot CoT few-shot CoT, where model are provided samples step-by-step reasoning, method all substantially performance without necessity in additional examples. This method especially efficient for more large language models, which already ability to reasoning, but can not apply this ability without make this • Self-reported
MATH
Standard • Self-reported
MGSM
0-shot CoT Method "chain thinking" without preliminary examples (0-shot Chain-of-Thought) represents itself approach, at which model solves task, her/its on sequential steps reasoning, not at this access to examples such reasoning in advance. In this approach model generate intermediate reasoning, which to answer, but makes this without demonstration examples that, how should look chain thinking. Usually method by means of in query phrases "Let's let's think step for step" or prompts, which model on reasoning. This encourages model perform step-by-step reasoning and its that often leads to more exact results by comparison with answer • Self-reported
Reasoning
Logical reasoning and analysis
BIG-Bench Hard
3-shot CoT In given approach we standard method Chain-of-Thought (CoT), providing model several examples (usually three) with reasoning for solutions tasks. Such approach "few-shot CoT", where "few-shot" number examples, and "CoT" indicates on chains reasoning. When model receives new task, she/it can on these examples, in order to process reasoning in common option includes three example, therefore we its "3-shot CoT". method 3-shot CoT in that, that he not requires complex instructions or query - sufficiently simply provide examples solutions. This especially useful for mathematical and logical tasks, where step-by-step reasoning critically important for obtaining correct answer • Self-reported
DROP
3-shot F1 Score
AI: 3-shot F1 Score • Self-reported
GPQA
Maj@32 5-shot CoT This method for improvement performance models at solving tasks logical output and decision-making solutions. He combines several approaches: 1. **Chain reasoning (Chain-of-Thought)**: Model solution complex tasks on sequence intermediate steps, process thinking. 2. **Few-shot examples**: Models is provided several (in given case 5) examples with correct reasoning and answers, that helps it better understand format solutions. 3. **(Majority voting)**: Model generates set independent solutions for one tasks (in given case 32), and then answer, which total. This approach significantly accuracy at solving complex tasks, since: - Chain reasoning process solutions - Few-shot examples model in format - errors in attempts Maj@32 5-shot CoT especially efficient for mathematical tasks, logical puzzles and tasks, requiring reasoning • Self-reported
Multimodal
Working with images and visual data
AI2D
test • Self-reported
ChartQA
test, accuracy • Self-reported
DocVQA
test, evaluation ANLS • Self-reported
MathVista
testmini • Self-reported
MMMU
Standard evaluation • Self-reported
Other Tests
Specialized benchmarks
MMLU-Pro
5-shot method (few-shot) prompting for training model task. We we provide model 5 examples solutions mathematical tasks with steps. This allows model understand format solutions and apply approach to new task without additional settings. In this method context includes 5 examples solutions, for which should task for solutions. Model should follow that indeed format and reasoning, that and in examples • Self-reported
OSWorld Extended
In standard mode we we evaluate model in that form, in which she/it usually is used in real situations. Model receives prompt without instructions about that, how to solving tasks. This basic mode allows us measure performance model • Self-reported
OSWorld Screenshot-only
Standard • Self-reported
TAU-bench Airline
Standard In this approach model directly generates solutions to tasks without any-or instructions. This also for comparison at various methods prompts. In its we used following format prompts: ``` task: [task] Please, task step for step. ``` However for some tasks we format, in order to follow specific instructions, in data. For example, for tasks from GPQA we used format: ``` [task] ``` • Self-reported
TAU-bench Retail
Standard AI: task, above • Self-reported
License & Metadata
License
proprietary
Announcement Date
October 22, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsClaude 3 Haiku
Anthropic
MM
Best score:0.9 (ARC)
Released:Mar 2024
Price:$0.25/1M tokens
Claude Sonnet 4
Anthropic
MM
Best score:0.8 (GPQA)
Released:May 2025
Price:$3.00/1M tokens
Claude Opus 4
Anthropic
MM
Best score:0.8 (GPQA)
Released:May 2025
Price:$15.00/1M tokens
Claude 3.5 Sonnet
Anthropic
MM
Best score:0.9 (HumanEval)
Released:Jun 2024
Price:$3.00/1M tokens
Claude 3 Opus
Anthropic
MM
Best score:1.0 (ARC)
Released:Feb 2024
Price:$15.00/1M tokens
Claude 3.7 Sonnet
Anthropic
MM
Best score:0.8 (GPQA)
Released:Feb 2025
Price:$3.00/1M tokens
Claude 3 Sonnet
Anthropic
MM
Best score:0.9 (ARC)
Released:Feb 2024
Price:$3.00/1M tokens
Claude Opus 4.1
Anthropic
MM
Best score:0.8 (TAU)
Released:Aug 2025
Price:$15.00/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.