Claude 3 Sonnet

Name: Claude 3 Sonnet
Author: Anthropic

Multimodal

Anthropic

Claude 3 Sonnet achieves an ideal balance between intelligence and speed — especially for enterprise workloads. The model delivers high performance at a lower cost compared to competitors and is engineered for high endurance in large-scale AI deployments.

Key Specifications

Parameters

Context

200.0K

Release Date

February 29, 2024

Average Score

73.8%

API Documentation Research Paper Results Blog

Timeline

Key dates in the model's history

Announcement

February 29, 2024

Last Update

July 19, 2025

Today

May 9, 2026

Technical Specifications

Parameters

Training Tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$3.00

Output (per 1M tokens)

$15.00

Max Input Tokens

200.0K

Max Output Tokens

200.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

HellaSwag

10-shot In 10-shot prompt, question first LLM with 10 examples, correct answers. Then target question, on which LLM should answer. Examples are and that, how solve problem. They should be so, in order to diverse cases in solutions, and not should be too on or on target question. 10-shot prompt often gives results, than 0-shot and 1-shot methods, and to efficiency fine-tuning for some tasks. However at him is 10-shot prompt part that and can create for with 10-shot prompt useful, when task requires demonstration diverse methods solutions, but not so when task or when can instructions for solutions tasks without examples • Self-reported

89.0%

MMLU

5-shot • Self-reported

79.0%

Programming

Programming skills tests

HumanEval

For research capabilities models we we use shot (0-shot), which means, that model not receives examples solutions tasks before that, how her/its ask execute assignment. We this approach by : 1. This use for majority people, with models; 2. This most strict test abilities model, not allowing it simply solutions from examples; 3. This ensures evaluation capabilities model without additional prompts or ; 4. Such approach answers in examples. For complex tests, such how GPQA, shot especially important, since provision examples can solutions or that evaluation. Using only question without examples, we we receive more evaluation basic knowledge and reasoning model • Self-reported

73.0%

Mathematics

Mathematical problems and computations

GSM8k

0-shot CoT Method analysis, which allows models think step for step for solutions tasks. In difference from method with with examples, in this approach model not receives examples thinking, and instead this her/its simply ask reason before provision final answer. Usually model receive instruction, : "Let's solve this task step for step". This instruction allows model break down complex task on more components, that leads to results by comparison with query answer. Efficiency 0-shot CoT often is evaluated by comparison with without reasoning on various mathematical and logical tasks. Research show, that such to significantly improves performance model, especially at solving complex tasks • Self-reported

92.3%

MATH

# 0-shot CoT Method "0-shot CoT" (Chain thinking without examples) based on that can encourage model think step by step over solution complex tasks, not showing it specific examples such step-by-step solutions. ## Approach includes in itself addition simple prompts, such how "Let's let's solve this step for step", to This approach model break down solution on sequential reasoning instead that, in order to immediately final answer. ## Advantages - ****: Not requires creation examples demonstrations for training model. - ****: Can to various tasks and models. - **Efficiency**: Can significantly improve performance model on complex tasks, requiring logical reasoning. ## Limitations - Efficiency depends from basic abilities model to reasoning. - Can work not so well, how few-shot CoT for some specific types tasks. - Quality reasoning and accuracy answers can in dependency from prompts. ## Examples use ``` Task: At was 5 He 2 apples and 3 apples from at him ? Query with 0-shot CoT: Let's let's solve this step for step. ``` ## Application 0-shot CoT especially useful for: - testing abilities model to reasoning - when at you no time or resources for creation examples - improvements performance on diverse tasks • Self-reported

43.1%

MGSM

0-shot simple and method measurement performance LLM on task consists in that, in order to simply provide task in capacity prompt without additional instructions or examples. this approach is then, that he not uses ability LLM adapt to specific or solutions, through this method is used for between models, especially when model not have context, in order to more complex approaches. This also • Self-reported

83.5%

Reasoning

Logical reasoning and analysis

BIG-Bench Hard

3-shot CoT In this mode instructions ask model generate chain reasoning for three examples, before than to For each from examples model receives solution and justification, and then should apply reasoning to 3-shot CoT process reasoning, model, how break down complex task on more managed steps. several examples, this method helps model identify corresponding templates and strategy solutions. this approach consists in its abilities performance on requiring step-by-step thinking, such how mathematical tasks, logical puzzles and other tasks, where direct output can be for achievements solutions • Self-reported

82.9%

DROP

3-shot, F1 score F1-evaluation — this measure accuracy, which represents itself harmonic average between accuracy (precision) and (recall). F1-evaluation provides metric for evaluation between accuracy and She/It especially useful, when In 3-shot F1-evaluation model makes after that, how it showed 3 example (3 "" or "attempts"). This way measurement that, how well well model can on basis number examples, that important for evaluation abilities training • Self-reported

78.9%

GPQA

0-shot CoT - Diamond Zero-shot Chain of Thought (0-shot CoT) - this approach to solving tasks, which encourages model provide step-by-step reasoning, prompt "let's let's think step for step" before query to model. This method allows models structure complex reasoning without examples reasoning, that usually leads to results for tasks, requiring several steps thinking. Diamond process, solution in format: first problem on information and goal, then sequentially solves problem, stages reasoning. Diamond on main solutions, details, that especially useful for complex mathematical or tasks. We we use Diamond 0-shot CoT for evaluation, since he effectively thinking model without necessity in examples, which could would model to specific solutions • Self-reported

40.4%

Other Tests

Specialized benchmarks

ARC-C

25-shot • Self-reported

93.2%

MMLU-Pro

0-shot CoT 0-shot Chain-of-Thought (CoT) — this method, which offers models think, before than give final answer. This by means of prompts, such how "Let's let's think step for step" before query answer. In difference from few-shot CoT, which provides examples chains reasoning, 0-shot CoT not requires no/none examples. 0-shot CoT is one from most methods improvement performance LLM. He substantially improves ability models solve tasks, tasks on and tasks. Method especially efficient for modern LLM with to reasoning, such how GPT-4. that, 0-shot CoT for various methods reasoning. For example, he allows models apply strategies verification their own solutions and • Self-reported

56.8%

License & Metadata

License

proprietary

Announcement Date

February 29, 2024

Last Updated

July 19, 2025

Similar Models

All Models

Claude 3.7 Sonnet

Anthropic

Best score:0.8 (GPQA)

Released:Feb 2025

Price:$3.00/1M tokens

Claude 3.5 Sonnet

Anthropic

Best score:0.9 (HumanEval)

Released:Oct 2024

Price:$3.00/1M tokens

Claude 3 Haiku

Anthropic

Best score:0.9 (ARC)

Released:Mar 2024

Price:$0.25/1M tokens

Claude Sonnet 4

Anthropic

Best score:0.8 (GPQA)

Released:May 2025

Price:$3.00/1M tokens

Claude Opus 4

Anthropic

Best score:0.8 (GPQA)

Released:May 2025

Price:$15.00/1M tokens

Claude Haiku 4.5

Anthropic

Best score:0.8 (TAU)

Released:Oct 2025

Price:$1.00/1M tokens

Claude Sonnet 4.6

Anthropic

Best score:0.9 (GPQA)

Released:Feb 2026

Price:$3.00/1M tokens

Claude 3.5 Sonnet

Anthropic

Best score:0.9 (HumanEval)

Released:Jun 2024

Price:$3.00/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.