Claude 3 Opus

Name: Claude 3 Opus
Author: Anthropic

Multimodal

Anthropic

Claude 3 Opus is Anthropic's most intelligent model with market-leading performance on highly complex tasks. It can handle open-ended prompts and unforeseen scenarios with remarkable fluency and human-like understanding, demonstrating the cutting edge of generative AI.

Key Specifications

Parameters

Context

200.0K

Release Date

February 29, 2024

Average Score

81.6%

API Documentation Research Paper Results Blog

Timeline

Key dates in the model's history

Announcement

February 29, 2024

Last Update

July 19, 2025

Today

May 9, 2026

Technical Specifications

Parameters

Training Tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$15.00

Output (per 1M tokens)

$75.00

Max Input Tokens

200.0K

Max Output Tokens

200.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

HellaSwag

10-shot Method 10-shot (10 examples) represents itself approach prompting, at which LLM (language model) is provided 10 examples, task and format answer, before than model execute assignment. In difference from other methods prompting with number examples, 10-shot ensures between context for understanding model and Usually examples thoroughly in order to diverse cases or scenarios, with which model can 10-shot especially useful for: • solving complex tasks with specific • various logical or approaches to solving • when prompts 0-shot or few-shot • and analysis for answers model When evaluation with using methodology 10-shot model shows, how she/it can patterns from examples and apply their to new task, that often gives results above, than at with number examples • Self-reported

95.4%

MMLU

5-shot In our research we first model several examples solutions tasks, and then we evaluate, how well well model can apply then, that she/it from these examples, to new tasks. model should solve several mathematical puzzles. We we provide model 5 (task, solution), and then we ask her/its solve new task. This approach allows us evaluate ability model new templates and strategies solutions tasks from numbers examples. Evaluation on 10 various sets tasks, where each set contains 5 examples and 1 task. We we evaluate how correctness final answer, so and reasoning, to answer. This method allows us understand, how well well model can adapt to new tasks without preliminary training on large sets similar tasks. This important for evaluation model and her/its abilities to evaluation represents itself percentage correctly solved test tasks by all 10 • Self-reported

86.8%

Programming

Programming skills tests

HumanEval

Task without examples 0-shot means provision model only instructions for execution tasks without any-or examples, data and results. This means, that model should rely only on its preliminarily trained knowledge for interpretation tasks and generation answer. For example, if you in order to model about query in 0-shot will simply: "about ". Model should understand query and generate without examples that, which you 0-shot — this for with models LLM, and usually this first approach, which should before than to more complex prompts • Self-reported

84.9%

Mathematics

Mathematical problems and computations

GSM8k

**0-shot CoT** Zero-shot Chain-of-Thought (0-shot CoT) — this method, which encourages model perform reasoning at solving tasks without necessity use examples. In difference from queries, where at model answer, method 0-shot CoT encourages model "think step by step", in order to she/it its course thoughts before that, how give answer. This method usually by means of simple prompts, such how "Let's let's solve this step for step" or "Let's let's think ", after descriptions tasks. Such approach allows models generate intermediate steps reasoning, that often leads to more high accuracy, especially in tasks, requiring several steps computations or logical conclusions. Research showed, that application 0-shot CoT can significantly improve performance language models in various tasks, including and tasks general output, at this not no/none examples or additional training • Self-reported

95.0%

MATH

Zero-shot Chain-of-Thought (0-shot CoT) - this technique, which encourages model step-by-step reasoning without examples. This method was first presented in work Kojima et al., "Large Language Models are Zero-Shot Reasoners" (2022). In difference from few-shot CoT, which requires examples reasoning, 0-shot CoT uses simple prompts, such how "Let's let's think step for step" or "Let's let's solve this problem, by ". These phrases model generate chain intermediate reasoning before final answer. 0-shot CoT consists in its and efficiency without necessity creation examples reasoning for each tasks. Research showed, that even such simple prompts can significantly improve performance model in tasks, requiring reasoning, especially in mathematical and logical tasks. Although 0-shot CoT not so efficient, how few-shot CoT in complex tasks, he represents itself when examples or when is required application to tasks • Self-reported

60.1%

MGSM

0-shot AI: about mathematics. I from mathematics, task and solution. I I will consider how so and "". Task: At us is 10 from 1 to 10. We manner we choose 4 probability that, that among selected will although would one with more 8? Solution: In order to find probability "among selected is although would one with more 8", I probability "all have not more 8", and then this probability from 1. number ways choose 4 from 10 C(10,4) = 10!/(4!×6!) = 210. number ways choose 4 only from with from 1 to 8. This C(8,4) = 8!/(4!×4!) = 70. manner, probability that, that all have not more 8, 70/210 = 1/3. probability that, that among selected is although would one with more 8, 1 - 1/3 = 2/3. Answer: 2/3 • Self-reported

90.7%

Reasoning

Logical reasoning and analysis

BIG-Bench Hard

3-shot CoT Method reasoning by chain (Chain-of-Thought, CoT) with three examples. This approach CoT-method, providing model three example that, how break down complex tasks on sequential steps reasoning. Each example demonstrates process step-by-step solutions, that helps model structure reasoning. When 3-shot CoT to new task model should format, solution on logical stages, that especially useful for mathematical and logical tasks. This method requires on and shows improvement performance by comparison with more and reasoning • Self-reported

86.8%

DROP

3-shot, F1 Score • Self-reported

83.1%

GPQA

0-shot CoT - Diamond AI: ChatGPT-4o Reviewer: Anthropic Claude 3 Opus For evaluation reasoning model in solving tasks about was method 0-shot Chain of Thought (CoT), when model solves task without examples reasoning. In each task only instruction "Let's think step by step" in end text tasks. This encourages model perform sequential reasoning instead that, in order to immediately give answer. Using standard approach 0-shot CoT, we evaluate ability model to reasoning without provision examples or prompts about specific in answer. This approach for analysis that, how model rules about and applies their in different scenarios, that gives representation about her/its basic to reasoning and • Self-reported

50.4%

Other Tests

Specialized benchmarks

ARC-C

25-shot • Self-reported

96.4%

MMLU-Pro

0-shot CoT Chain-of-thought (CoT) — this method, which model intermediate reasoning before answer. Model to solving problems, that allows it track complex tasks. prompts for chains reasoning query type "let's think step by step", model perform output at solving tasks. In difference from few-shot CoT, where examples chains reasoning, 0-shot CoT not provides such examples. Efficiency 0-shot CoT can substantially in dependency from tasks, model and specific prompts, to Although few-shot CoT often gives more results, 0-shot CoT can be method for specific types tasks, especially when examples or their difficult • Self-reported

68.5%

License & Metadata

License

proprietary

Announcement Date

February 29, 2024

Last Updated

July 19, 2025

Compare Claude 3 Opus

All comparisons

vs Claude 3.5 Sonnet vs Claude Opus 4.5 vs Claude Sonnet 4.6 vs Kimi-k1.5 vs Grok 4 Fast vs GPT-5.4 mini vs DeepSeek-V3.1 vs o1-pro

Similar Models

All Models

Claude 3.5 Sonnet

Anthropic

Best score:0.9 (HumanEval)

Released:Oct 2024

Price:$3.00/1M tokens

Claude Haiku 4.5

Anthropic

Best score:0.8 (TAU)

Released:Oct 2025

Price:$1.00/1M tokens

Claude Opus 4.5

Anthropic

Best score:0.9 (TAU)

Released:Nov 2025

Price:$5.00/1M tokens

Claude 3.5 Sonnet

Anthropic

Best score:0.9 (HumanEval)

Released:Jun 2024

Price:$3.00/1M tokens

Claude Sonnet 4.5

Anthropic

Best score:0.9 (TAU)

Released:Sep 2025

Price:$3.00/1M tokens

Claude Opus 4.1

Anthropic

Best score:0.8 (TAU)

Released:Aug 2025

Price:$15.00/1M tokens

Claude Opus 4

Anthropic

Best score:0.8 (GPQA)

Released:May 2025

Price:$15.00/1M tokens

Claude 3.7 Sonnet

Anthropic

Best score:0.8 (GPQA)

Released:Feb 2025

Price:$3.00/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.