Key Specifications
Parameters
3.2B
Context
128.0K
Release Date
September 25, 2024
Average Score
55.6%
Timeline
Key dates in the model's history
Announcement
September 25, 2024
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
3.2B
Training Tokens
9.0T tokens
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$0.01
Output (per 1M tokens)
$0.02
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
HellaSwag
0-shot, accuracy
ChatBot: AI • Self-reported
MMLU
5-shot, macro_avg/acc • Self-reported
Mathematics
Mathematical problems and computations
GSM8k
8-attempts, em_maj1@1
AI: ChloeAI
Prompt: We use 8 shots of previous QA pairs in a retrieval setting, where we retrieve relevant context by embedding similarity. We define test accuracy as the majority vote (maj1) of the model's answers over all 8 trials for a single exact match (em). This aggregates over potential randomness in responses. • Self-reported
MATH
0-shot, final_em For each example we used one query to model. We not answers model, final extraction answer. answer (final_em): We we determine answer how process answer from model after that, how she/it fully solution tasks. For this we following : (1) If answer already in format (for example, "The answer is 42"), we final answer (in given case "42"). (2) If task in format with multiple choice, and model indicates option (for example, "(A)"), we this option. (3) In case we final answer in or model. If several numbers, we number • Self-reported
MGSM
Chain thinking, em
AI: I don't understand the "em" in this text. Let me reason about this step by step.
In the context of prompt engineering and AI methods, "CoT" clearly refers to "Chain of Thought", which is a prompting technique where the model is encouraged to break down its reasoning into sequential steps.
The "em" could potentially refer to:
1. "Expectation maximization" - a statistical algorithm
2. "em" as in emphasis in HTML/markdown (like *this*)
3. Some kind of metric or modifier related to CoT
4. A typo or abbreviation for something else
Since this is just a two-word fragment without context, the most likely interpretation is that it's referring to Chain of Thought reasoning with some kind of "em" qualifier or metric associated with it.
But without more context, I can only provide this basic translation of the terms as they appear. • Self-reported
Reasoning
Logical reasoning and analysis
GPQA
0-shot, accuracy
AI: In this category, we compute the accuracy of the model's predictions on our pre-determined list of questions directly from the model's top 1 output, without any prompting or support. • Self-reported
Other Tests
Specialized benchmarks
ARC-C
0-shot, acc standard 0-shot evaluation without which-or additional information or demonstrations. Accuracy is calculated on set, in order to Since methodology 0-shot is used how in evaluation, so and at testing in this match between evaluation and • Self-reported
BFCL v2
0-shot, accuracy
AI: Prompt Steerability • Self-reported
IFEval
Average value (accuracy instructions/prompts strict/) • Self-reported
InfiniteBench/En.MC
0-shot, longbook_choice/acc • Self-reported
InfiniteBench/En.QA
0-shot, longbook_qa/f1 • Self-reported
Nexus
0-shot, macro_avg/acc • Self-reported
NIH/Multi-needle
0-shot, reproduction AI: model used which in ? • Self-reported
Open-rewrite
0-shot, micro_avg/rougeL • Self-reported
TLDR9+ (test)
1-shot, rougeL • Self-reported
License & Metadata
License
llama_3_2_community_license
Announcement Date
September 25, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsLlama 3.1 8B Instruct
Meta
8.0B
Best score:0.8 (ARC)
Released:Jul 2024
Price:$0.20/1M tokens
Llama 3.1 Nemotron Nano 8B V1
NVIDIA
8.0B
Best score:0.5 (GPQA)
Released:Mar 2025
Gemma 2 9B
9.2B
Best score:0.7 (MMLU)
Released:Jun 2024
Ministral 8B Instruct
Mistral AI
8.0B
Best score:0.7 (ARC)
Released:Oct 2024
Price:$0.10/1M tokens
Phi-3.5-mini-instruct
Microsoft
3.8B
Best score:0.8 (ARC)
Released:Aug 2024
Price:$0.10/1M tokens
Phi 4 Mini
Microsoft
3.8B
Best score:0.8 (ARC)
Released:Feb 2025
Qwen2.5 7B Instruct
Alibaba
7.6B
Best score:0.8 (HumanEval)
Released:Sep 2024
Price:$0.30/1M tokens
Qwen2 7B Instruct
Alibaba
7.6B
Best score:0.8 (HumanEval)
Released:Jul 2024
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.