Claude 3 Haiku

Name: Claude 3 Haiku
Author: Anthropic

Multimodal

Anthropic

Claude 3 Haiku is the fastest and most compact model in the Claude 3 family, designed for instant responsiveness. It excels at answering simple queries and requests with unmatched speed, making it ideal for seamless AI interactions that mimic human communication.

Key Specifications

Parameters

Context

200.0K

Release Date

March 13, 2024

Average Score

71.5%

API Documentation Research Paper Results Blog

Timeline

Key dates in the model's history

Announcement

March 13, 2024

Last Update

July 19, 2025

Today

May 10, 2026

Technical Specifications

Parameters

Training Tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$0.25

Output (per 1M tokens)

$1.25

Max Input Tokens

200.0K

Max Output Tokens

200.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

HellaSwag

10-AI: 10-shot • Self-reported

85.9%

MMLU

5-shot • Self-reported

75.2%

Programming

Programming skills tests

HumanEval

0-shot Mode "0-shot" relates to to evaluation model without provision examples that, how perform task. Models is provided only instruction or query, and she/it should generate answer without training examples. This method evaluation shows ability model perform task, exclusively on knowledge, obtained in time preliminary training, and without additional context or examples execution specific tasks. 0-shot testing especially important for measurement general capabilities model and her/its abilities follow instructions without additional prompts. This most strict evaluation, since he requires from model knowledge on new task without examples • Self-reported

75.9%

Mathematics

Mathematical problems and computations

GSM8k

0-shot CoT Method "0-shot Chain of Thought" (0-shot CoT) — this approach, at which model ask "step by step" at solving tasks, not providing examples such reasoning. simple way 0-shot CoT — phrase "let's let's think step by step" to query. This encourages model generate chain logical reasoning before that, how give final answer. In difference from queries, where model can immediately answer, 0-shot CoT stimulates model break down complex problem on more managed parts, that often leads to more exact results, especially in complex tasks, such how mathematical computation or logical puzzles. 0-shot CoT in that, that he not requires for tasks examples with reasoning, that makes this method more by comparison with few-shot CoT • Self-reported

88.9%

MATH

0-shot CoT Chain thinking without preliminary training This method model step by step reasoning without provision examples such reasoning. Despite on examples, to model "step by step" before provision answer often significantly improves performance. This approach especially useful in situations, when example or when tasks too in order to their examples. Query on step-by-step reasoning encourages model and structure its answer, that often leads to more exact results. How and other methods reasoning, 0-shot CoT substantially for more large models, since they better instructions and can generate more complex reasoning • Self-reported

38.9%

MGSM

When model all necessary about task, but that-then it simply correct answer, we error output. In order to evaluate ability model correct answers from at her information, we we use tasks, which require logical reasoning about in Examples include tasks on with puzzles with and problems, which can solve For example, we we can give model prompt: "most ?" information, for answer, in If model gives incorrect answer, this about that, that she/it not correctly execute output. This errors differs from errors knowledge, where model could would answer correctly at specific information. at output model already has information, but that-then in her/its or process generation answer • Self-reported

75.1%

Reasoning

Logical reasoning and analysis

BIG-Bench Hard

3-shot CoT Reasoning by chain (Chain-of-Thought, CoT) with using three examples is method improvements reasoning model LLM through examples step-by-step solutions problems. This standard prompting with several examples, but with important : each example not simply shows but and demonstrates intermediate steps reasoning. In 3-shot CoT (reasoning by chain with three examples) we we provide model three example reasoning, where for each example : 1. Task/question 2. reasoning, course thoughts 3. answer This method especially efficient for mathematical tasks, logical puzzles and tasks, requiring reasoning. three examples usually ensures context, in order to model template reasoning, at this not Research show, that model, trained with help CoT, often demonstrate improvement in solving complex tasks by comparison with since they break down problems on more managed steps and reason sequentially • Self-reported

73.7%

DROP

3-shot, F1 score Metric F1 evaluates performance model for solutions mathematical tasks, requiring several steps reasoning. In task with 3-shot model are provided three example solutions before that, how she/it to new task. F1-measure represents itself harmonic average between accuracy (precision) and (recall). She/It especially useful for data, where important how so and results. In context mathematical tasks F1 score measures, how well well model can correct steps reasoning and to correct answers, capability number examples from sample • Self-reported

78.4%

GPQA

0-shot CoT Model intermediate steps reasoning for obtaining answer, not special instructions in prompt. This when model solve task and in solutions course its thoughts. Method differs from 0-shot that, that model not simply immediately answer, and its solutions. that, this differs from chain-of-thought (chains reasoning), where in prompt is "let's let's think step for step". In 0-shot CoT model independently solves show intermediate steps without query. Example: if question about solving mathematical tasks, model not only gives answer, but and shows stages solutions, although in prompt not was explain course solutions • Self-reported

33.3%

Other Tests

Specialized benchmarks

ARC-C

25-shot Method 25-shot (25 examples) — this technique, at which we we provide model AI 25 examples previous answers or solutions tasks before that, how model solve new task. This approach especially useful for settings model on format or answer and usually gives results, than methods with number examples, such how 0-shot (without examples) or few-shot (several examples). In our research we used 25-shot for improvements performance models on complex mathematical tasks from competitions level AIME and FrontierMath. model 25 fully solved tasks with detailed we significantly ability model methods solutions and follow specific reasoning. method 25-shot consists in that, that he gives model sufficiently context for identification in not at this modern LLM. However this method requires examples, which should be for tasks • Self-reported

89.2%

License & Metadata

License

proprietary

Announcement Date

March 13, 2024

Last Updated

July 19, 2025

Similar Models

All Models

Claude Sonnet 4

Anthropic

Best score:0.8 (GPQA)

Released:May 2025

Price:$3.00/1M tokens

Claude Opus 4

Anthropic

Best score:0.8 (GPQA)

Released:May 2025

Price:$15.00/1M tokens

Claude 3.7 Sonnet

Anthropic

Best score:0.8 (GPQA)

Released:Feb 2025

Price:$3.00/1M tokens

Claude 3 Sonnet

Anthropic

Best score:0.9 (ARC)

Released:Feb 2024

Price:$3.00/1M tokens

Claude 3.5 Sonnet

Anthropic

Best score:0.9 (HumanEval)

Released:Oct 2024

Price:$3.00/1M tokens

Claude Sonnet 4.6

Anthropic

Best score:0.9 (GPQA)

Released:Feb 2026

Price:$3.00/1M tokens

Claude Opus 4.6

Anthropic

Best score:1.0 (TAU)

Released:Feb 2026

Price:$5.00/1M tokens

Claude Sonnet 4.5

Anthropic

Best score:0.9 (TAU)

Released:Sep 2025

Price:$3.00/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.