Meta logo

Llama 3.1 70B Instruct

Meta

Llama 3.1 70B Instruct is a large language model optimized for multilingual conversational use cases. It outperforms many available open and closed chat models on standard industry benchmarks.

Key Specifications

Parameters
70.0B
Context
128.0K
Release Date
July 23, 2024
Average Score
74.7%

Timeline

Key dates in the model's history
Announcement
July 23, 2024
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
70.0B
Training Tokens
15.0T tokens
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$0.89
Output (per 1M tokens)
$0.89
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
MMLU
5-shotSelf-reported
83.6%

Programming

Programming skills tests
HumanEval
model, such how GPT-4, generate and correct answers, but these answers can on errors or reasoning, which difficult without special knowledge or verification. in advance answer model and explain, that in incorrectly, — this approach to verification understanding and model. If model answer and gives correct explanation, this about her/its abilities critically evaluate information and correct errors. However if model incorrect answer how correct or tries its this can on understanding to information or in training. Such method especially useful for evaluation behavior model in fields, where correct answers and can be for example, in mathematics, or actual informationSelf-reported
80.5%

Reasoning

Logical reasoning and analysis
DROP
shot We we evaluate capabilities model in mode answer on question directly, without examples, instructions and additional context. This allows evaluate basic abilities model. We shot : - model only question or task - model question with about format answer This mode allows verify, how well well model understands and solves tasks, relying on only on its preliminarily trained knowledge. This especially important for evaluation abilities model correctly interpret tasks without additional prompts or examplesSelf-reported
79.6%
GPQA
0-shot In by AI "0-shot" ("shot") relates to to evaluation abilities model machine training perform task without any-or examples or instructions about this specific task. Model is evaluated only on basis her/its abilities apply general training to new task, not special examples. For example, in order to evaluate 0-shot abilities LLM, we we can ask its solve task, which he not without provision samples solutions. 0-shot often with how few-shot approaches, where model are provided several examples before tasks. 0-shot especially important for evaluation abilities model and level her/its understanding domain fieldSelf-reported
41.7%

Other Tests

Specialized benchmarks
API-Bank
When 0-shot testing model not receives examples execution tasks with results. Instead this model should rely exclusively on its knowledge, obtained in time preliminary training, for formation answer. This method evaluation shows ability model its knowledge on tasks, with which she/it explicitly not in time trainingSelf-reported
90.0%
ARC-C
0-shot For training or evaluation with model receives task without any-or examples or additional information, and should perform her/its, relying on only on its preliminarily obtained knowledge and In difference from approaches with examples (few-shot), where model can on basis several examples, with in 0-shot approach model should rely exclusively on knowledge, obtained in time training. This approach demonstrates ability model and apply its knowledge to new tasks without additional instructions. 0-shot evaluation often is used how way verification basic capabilities model and her/its abilities apply knowledge to tasks, that is score general intelligence and modelSelf-reported
94.8%
BFCL
Standard evaluation AI: texts about models artificial intelligence. whether I help with than-then still?Self-reported
84.8%
Gorilla Benchmark API Bench
Method with examples (0-shot) means, that task without provision examples that, how her/its solve. Model uses only instructions (prompt) and should independently understand, how execute assignment. This most complex for model approach, since she/it not receives additional context or examples execution similar tasks. In case examples model exclusively on knowledge, obtained in time preliminary training, and on query. This method often is used for evaluation basic abilities model to and tasks without additional helpSelf-reported
29.7%
GSM-8K (CoT)
8-shot Chain-of-Thought 8-shot Chain-of-Thought (CoT) offers model execute reasoning, from several for answer on question. Examples (usually about 8) include in itself and question, and step-by-step reasoning, to answer. These examples for which demonstrates, how break down complex question on sequence intermediate steps. When LLM presented with new after these examples, he reasoning, sequence steps thinking before provision answer. This method especially efficient for tasks, requiring complex reasoning, such how mathematical tasks, logical puzzles and conclusions. 8-shot CoT in that, that he not requires instructions about that, how reason — instead this model from examples. This allows LLM apply step-by-step thinking to tasks without necessity specialized prompts for each type tasksSelf-reported
95.1%
IFEval
Standard evaluation AI, Inc and other tests for research performance models at execution various tasks, and results for comparison with other models. by many important tasks and not less, these evaluation have several During-they often evaluate only final answer model, not how she/it to answer. For example, for tasks 97 × 98, some model, such how Claude, can obtain correct answer (9506), but at this use incorrect method solutions (97 × 98 = 97 × 100 - 97 × 2 = 9700 - 194 = 9506). Analysis intermediate steps reasoning can give representation about that, how and why model errors. During-majority evaluations with using basic model and not allow models use capabilities, such how tools or thinking. usually and answers, without capabilities additional queries to model, if answer or In-standard evaluation often in format /without degree correctness answer or model in solving tasks. existing benchmarks less by that, how all more models achieve performance on these tasks. For example, most model achieve in MMLU and other benchmarksSelf-reported
87.5%
MATH (CoT)
0-shot Chain-of-Thought Chain-of-thought (CoT) model LLM intermediate steps its reasoning, that leads to results at solving tasks, requiring reasoning. For assignments, not requiring reasoning, CoT usually not demonstrates In 0-shot CoT, LLM not receives examples with reasoning — instead this it simply "step by step" (or is used prompt). In difference from this, in few-shot CoT model are provided examples with before than she/it with new task. This method "0-shot CoT", since he not uses examples reasoning, but at this requires prompt, model reason step for stepSelf-reported
68.0%
MBPP ++ base version
assignments 0-shot (attempts) when model directly for solutions tasks without provision it examples for training on basis these examples. This few-shot (several attempts), where model are provided examples correct answer on task before that, how it solve new problem. 0-shot is one from most complex scenarios for model, since from her is required execute task without preliminary training assignments or prompts about that, how structure answer. However this also one from most scenarios use, since he requires with side user. This method often is used how basic level at evaluation performance model, so how he shows, how well well model can apply its knowledge in new contexts without additional performance 0-shot indicates on then, that model understanding tasks and in time preliminary trainingSelf-reported
86.0%
MMLU (CoT)
0-shot Chain-of-Thought Chain reasoning without examples (0-shot Chain-of-Thought, CoT) - this method prompting language model break down its process solutions on sequential steps reasoning, not providing examples that, how chain reasoning. In standard approach CoT 0-shot model receives query, "Let's let's think step for step" (or ) before that, how she/it gives its final answer. This allows model execute step-by-step reasoning, which often leads to more exact answers, especially for complex tasks, requiring multi-step reasoning. In difference from few-shot CoT, where model are provided examples step-by-step reasoning, 0-shot CoT on ability model independently generate reasoning without any-or examples. This in modern LLM, which were on various examples reasoning and can apply this to new tasks even without specific examplesSelf-reported
86.0%
MMLU-Pro
5-shot Chain-of-Thought AI: 5-shot Chain-of-ThoughtSelf-reported
66.4%
Multilingual MGSM (CoT)
0-shot Chain-of-Thought AI: 0-shot Chain-of-ThoughtSelf-reported
86.9%
Multipl-E HumanEval
0-shot In our in capacity base settings we we use 0-shot prompts. is we not model examples answers on tasks, and simply her/its directly. For 0-shot questions from GPQA, prompt consists from simple instructions and question: "Question: [question]. Answer:". For solutions tasks by mathematics task how: "Solve following task step for step: [task]". When we we tool use, for example we text in prompt, how can use tool. For example, for : "If in you need to execute you use which need to between <calculator></calculator>. For example, <calculator>12*34</calculator>. Not perform complex computation independently. Instead this "Self-reported
65.5%
Multipl-E MBPP
## 0-shot In this model is provided only question, without any-or examples. Model should directly answer on question, not access to examples, correct way answer. This most strict test abilities model follow since she/it should understand, that from is required, only on querySelf-reported
62.0%
Nexus
In approach with training (0-shot) model uses only query for execution assignments. She/It not receives examples that, how work with task, not can on previous similar tasks and not has capabilities its behavior on basis previous attempts. Model in time preliminary training and She/It should interpret query and answer, only on own basic abilities. This most strict since he evaluates abilities model without which-or additional help or Model not can on examples or prompts, in order to understand, how specifically should or structure answer. Results in mode 0-shot usually than at other but they give most evaluation basic knowledge and reasoning modelSelf-reported
56.7%

License & Metadata

License
llama_3_1_community_license
Announcement Date
July 23, 2024
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.