Key Specifications
Parameters
8.0B
Context
131.1K
Release Date
July 23, 2024
Average Score
61.3%
Timeline
Key dates in the model's history
Announcement
July 23, 2024
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
8.0B
Training Tokens
15.0T tokens
Knowledge Cutoff
December 31, 2023
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$0.20
Output (per 1M tokens)
$0.20
Max Input Tokens
131.1K
Max Output Tokens
131.1K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
5-shot • Self-reported
Programming
Programming skills tests
HumanEval
In 0-shot model receives question, which she/it should solve directly, without provision examples or capabilities how to solving tasks. This most prompt, in which model receives direct query and should immediately indeed give answer. allows evaluate basic abilities model solve tasks without which-or additional help or context • Self-reported
Reasoning
Logical reasoning and analysis
DROP
prompt • Self-reported
GPQA
and understanding all value in world AI. Although LLM, such how GPT, in language, they not were for information. However multimodal systems, such how GPT-4 with Vision and Claude 3, models and understand images and text Tasks understanding from and processing to descriptions complex visual and text with images. Models, capabilities, should ability exactly interpret images, data from visual and and information for execution tasks. For evaluation visual capabilities model we we verify several key aspects: 1. Exact description images 2. text in (OCR) 3. Analysis and data 4. Understanding complex visual 5. and information This evaluation allows determine, how well well model can "" and interpret new capabilities for applications AI in various fields • Self-reported
Other Tests
Specialized benchmarks
API-Bank
0-shot AI, 0-shot: Model receives task without preliminary examples, instructions or context. Answer exclusively on training model and data in task. for determination basic abilities and limitations model. AI • Self-reported
ARC-C
In this method we we offer language model directly execute task, without any-or examples or instructions about For tasks with choice answer we we provide question and answers, we ask model choose answer and explain its choice. For tasks with answer we simply we ask model answer on question. Answer directly from answer model. Query for tasks with choice answer: ``` Please, on following question and reasoning. <question> (A) <option A> (B) <option B> ... ``` Query for tasks with answer: ``` Please, on following question. <question> ``` • Self-reported
BFCL
In our research we we evaluate GPT for training generate and images. Using 0-shot approach, we we provide model task without preliminary examples or instructions by format answer. This most complex since model should understand task and answer, relying on only on its preliminarily trained knowledge. This approach better reflects real scenarios use, where users often questions without answers. We we analyze: 1. Ability model correctly interpret queries 2. Quality explanations without preliminary example 3. Accuracy answers at additional context 4. to various questions Evaluation performance in 0-shot scenarios especially important for understanding model in real conditions, where users provide samples answers • Self-reported
Gorilla Benchmark API Bench
0-shot AI: In this mode model answers on question directly without any-or additional instructions, or instructions, how think. This relates to to use, when user simply question, and model gives answer without additional For example, if query "Solve equation x² + 5x + 6 = 0", system simply solves equation directly. This basic mode for majority with LLM, which abilities model without any-or additional thinking • Self-reported
GSM-8K (CoT)
8-shot • Self-reported
IFEval
text for analysis. I help with by but me text on language • Self-reported
MATH (CoT)
0-shot In context large language models (LLM) "0-shot" relates to to abilities model perform task without any-or examples. Model should rely exclusively on knowledge, obtained in time preliminary training, in order to understand task and generate answer. When 0-shot approach user simply describes task or question, not providing samples that, how should look answer. This with few-shot approach, at which user provides one or several examples, format or way reasoning. 0-shot testing — strict verification understanding model tasks and her/its abilities apply its knowledge to new without additional context or examples. This also most common way with LLM in scenarios use • Self-reported
MBPP EvalPlus (base)
## Evaluation without examples Evaluation without examples (0-shot) - this approach, at which model LLM solves task without any-or preliminary examples or samples. Model should use only instructions in and its preliminarily trained knowledge for formation answer. ### Application 0-shot evaluation usually is applied for: - basic capabilities model without additional help - Evaluations abilities model understand and follow instructions - knowledge, in time preliminary training - level performance for comparison with other methods queries ### Advantages - : not requires creation examples - real scenarios use, when examples internal knowledge model, and not ability templates ### Limitations - gives more results by comparison with few-shot methods - Model can incorrectly understand task without examples - for complex or tasks ### Example query ``` Solve task: value x in 3x + 7 = 22 ``` This query not contains examples that, how should answer or which steps solutions • Self-reported
MMLU (CoT)
standard 0-shot that model performs task without examples. During many cases 0-shot consists from execution tasks, simply model answer, often in case tasks with answers. For tasks with choice answer, model can execute 0-shot task, simply correct answer. In more complex tasks model can generate reasoning, to answer. In tasks without choice answer model should not only generate answer, but and determine format answer. are provided additional instructions, format answer. In other cases model can format answer from tasks. Important note, that cases, where model determine, which answer from them • Self-reported
MMLU-Pro
5-shot • Self-reported
Multilingual MGSM (CoT)
Method 0-shot that you simply LLM question, answer and immediately this answer. This most approach with points view use and very for computation scores on large sets questions. However such approach can not identify full model, since not allows it and correct its answers. this method, how demonstrates performance by comparison with which allow model answers, use various approaches to solving problems or information. not less, this allows quickly compare base performance various models, especially when is capability evaluate questions for one times • Self-reported
Multipl-E HumanEval
Method "one example" In this method we model one task, not examples or instructions about that, how her/its solve. This standard way evaluation work LLM in benchmarks. Example Query: If I on with 50 in how many time me in order to 450 ? this This basic approach to evaluation models, which gives representation about that, how well well model "understands" task without additional context. When he most efficient This method well works for simple tasks or when model already solving specific type tasks in its data. Disadvantages For more complex tasks or those, which require approaches, training on one example often Model can not understand format or interpret task incorrectly • Self-reported
Multipl-E MBPP
0-shot AI: In given mode we simply we ask model directly answer on question without any-or additional instructions. For example, "most in ?" or "How ?". queries allow evaluate basic knowledge model, but give information about her/its reason. 0-shot testing usually shows results for simple questions, but handles with complex tasks • Self-reported
Nexus
0-shot AI: means use LLM for solutions new tasks, without provision it examples that, how perform this task or additional instructions, assignments. This approach important for testing, since evaluates, how model can independently interpret task and apply its knowledge, that more on then, how model are used in real and this best performance model in new situations. For example, in mathematical task 0-shot would, that model simply is provided task, such how "Solve equation: 2x + 5 = 15", without examples solutions similar or instructions by solutions • Self-reported
License & Metadata
License
llama_3_1_community_license
Announcement Date
July 23, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsLlama 3.2 3B Instruct
Meta
3.2B
Best score:0.8 (ARC)
Released:Sep 2024
Price:$0.01/1M tokens
Gemma 2 9B
9.2B
Best score:0.7 (MMLU)
Released:Jun 2024
Phi 4 Mini
Microsoft
3.8B
Best score:0.8 (ARC)
Released:Feb 2025
Phi-3.5-mini-instruct
Microsoft
3.8B
Best score:0.8 (ARC)
Released:Aug 2024
Price:$0.10/1M tokens
Qwen2.5 7B Instruct
Alibaba
7.6B
Best score:0.8 (HumanEval)
Released:Sep 2024
Price:$0.30/1M tokens
Qwen2 7B Instruct
Alibaba
7.6B
Best score:0.8 (HumanEval)
Released:Jul 2024
Llama 3.1 405B Instruct
Meta
405.0B
Best score:1.0 (ARC)
Released:Jul 2024
Price:$3.50/1M tokens
Llama 3.1 70B Instruct
Meta
70.0B
Best score:0.9 (ARC)
Released:Jul 2024
Price:$0.89/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.