Meta logo

Llama 3.2 3B Instruct

Meta

Llama 3.2 3B Instruct is a large language model that supports a 128K token context window and is a state-of-the-art solution in its class for on-device use in tasks such as summarization, instruction following, and text rewriting, running locally on the edge.

Key Specifications

Parameters
3.2B
Context
128.0K
Release Date
September 25, 2024
Average Score
55.6%

Timeline

Key dates in the model's history
Announcement
September 25, 2024
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
3.2B
Training Tokens
9.0T tokens
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$0.01
Output (per 1M tokens)
$0.02
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
HellaSwag
0-shot, accuracy ChatBot: AISelf-reported
69.8%
MMLU
5-shot, macro_avg/accSelf-reported
63.4%

Mathematics

Mathematical problems and computations
GSM8k
8-attempts, em_maj1@1 AI: ChloeAI Prompt: We use 8 shots of previous QA pairs in a retrieval setting, where we retrieve relevant context by embedding similarity. We define test accuracy as the majority vote (maj1) of the model's answers over all 8 trials for a single exact match (em). This aggregates over potential randomness in responses.Self-reported
77.7%
MATH
0-shot, final_em For each example we used one query to model. We not answers model, final extraction answer. answer (final_em): We we determine answer how process answer from model after that, how she/it fully solution tasks. For this we following : (1) If answer already in format (for example, "The answer is 42"), we final answer (in given case "42"). (2) If task in format with multiple choice, and model indicates option (for example, "(A)"), we this option. (3) In case we final answer in or model. If several numbers, we numberSelf-reported
48.0%
MGSM
Chain thinking, em AI: I don't understand the "em" in this text. Let me reason about this step by step. In the context of prompt engineering and AI methods, "CoT" clearly refers to "Chain of Thought", which is a prompting technique where the model is encouraged to break down its reasoning into sequential steps. The "em" could potentially refer to: 1. "Expectation maximization" - a statistical algorithm 2. "em" as in emphasis in HTML/markdown (like *this*) 3. Some kind of metric or modifier related to CoT 4. A typo or abbreviation for something else Since this is just a two-word fragment without context, the most likely interpretation is that it's referring to Chain of Thought reasoning with some kind of "em" qualifier or metric associated with it. But without more context, I can only provide this basic translation of the terms as they appear.Self-reported
58.2%

Reasoning

Logical reasoning and analysis
GPQA
0-shot, accuracy AI: In this category, we compute the accuracy of the model's predictions on our pre-determined list of questions directly from the model's top 1 output, without any prompting or support.Self-reported
32.8%

Other Tests

Specialized benchmarks
ARC-C
0-shot, acc standard 0-shot evaluation without which-or additional information or demonstrations. Accuracy is calculated on set, in order to Since methodology 0-shot is used how in evaluation, so and at testing in this match between evaluation andSelf-reported
78.6%
BFCL v2
0-shot, accuracy AI: Prompt SteerabilitySelf-reported
67.0%
IFEval
Average value (accuracy instructions/prompts strict/)Self-reported
77.4%
InfiniteBench/En.MC
0-shot, longbook_choice/accSelf-reported
63.3%
InfiniteBench/En.QA
0-shot, longbook_qa/f1Self-reported
19.8%
Nexus
0-shot, macro_avg/accSelf-reported
34.3%
NIH/Multi-needle
0-shot, reproduction AI: model used which in ?Self-reported
84.7%
Open-rewrite
0-shot, micro_avg/rougeLSelf-reported
40.1%
TLDR9+ (test)
1-shot, rougeLSelf-reported
19.0%

License & Metadata

License
llama_3_2_community_license
Announcement Date
September 25, 2024
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.