Key Specifications
Parameters
10.6B
Context
128.0K
Release Date
September 25, 2024
Average Score
63.6%
Timeline
Key dates in the model's history
Announcement
September 25, 2024
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
10.6B
Training Tokens
-
Knowledge Cutoff
December 31, 2023
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$0.18
Output (per 1M tokens)
$0.18
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
accuracy • Self-reported
Mathematics
Mathematical problems and computations
MATH
0-shot, CoT This basic option method "chains reasoning" (Chain of Thought), without additional examples. We model think step by step, but not we provide examples that, how this do. We we use such instructions, how "Let's let's solve this task step by step" or "Let's let's think about this step for step", in order to more reasoning. This method offers model create chain reasoning, to answer, instead that in order to generate answer directly. This approach especially useful for complex tasks, requiring multi-step reasoning, such how mathematical tasks or logical puzzles, where intermediate steps can have value for obtaining correct answer • Self-reported
MGSM
0-shot, CoT reasoning without examples. In this method we model solve task, using step-by-step reasoning, but not examples that, how this should look. Usually this with help prompts "about this step for step". This method often is used in research large language models, since he and efficient. Although model not receives step-by-step solutions, modern model usually generate its independently • Self-reported
Reasoning
Logical reasoning and analysis
GPQA
CoT Approach with using chains thinking (Chain-of-Thought, CoT) — this method, at which we at model answer on task without provision examples that, how solve tasks. Instead this we we ask model "think step for step", in order to she/it could structure its thoughts and break down complex task on more simple parts. For example, at use standard approach we simply : "17 × 28". When use approach with chain thinking we : "17 × 28. Let us think step for step." Research showed, that simple addition phrases "Let us think step for step" to query significantly improves ability model solve tasks, requiring several steps reasoning • Self-reported
Multimodal
Working with images and visual data
AI2D
Accuracy test
AI: I'll translate the technical text about AI model analysis.
Test accuracy • Self-reported
ChartQA
Test, 0-shot CoT relaxed accuracy • Self-reported
DocVQA
ANLS AI: I'll translate this technical text about model analysis methods according to your requirements. ## ANLS ANLS (Average Normalized Levenshtein Similarity) - this metric evaluation for tasks extraction text from images, which measures between and : 1. For each and text is calculated 2. This by means of on more 3. in : 1 - _4. ANLS - this average value these by all examples ANLS is used in tasks OCR, text in and extraction information from visual data. ANLS from 0 to 1, where 1 means match between and text • Self-reported
MathVista
Accuracy on AI: 5 points to evaluate AI performance on Machine Learning benchmarks. 1. Accuracy and metric definition. 2. Train-test split: Is there a clear split? Is test data truly unseen? 3. Memorization risk: Could the model have seen test examples during pretraining? 4. Benchmark staleness: How widely known is the benchmark? Has it been used to optimize model performance? 5. Problem difficulty: Is the benchmark challenging enough to differentiate model capabilities? • Self-reported
MMMU
Val, 0-shot CoT, accuracy • Self-reported
Other Tests
Specialized benchmarks
MMMU-Pro
Accuracy testing AI: in training measure that, how well exactly model makes on data, which she/it not in time training. Usually is calculated how percentage correct from general numbers on test set data • Self-reported
VQAv2 (test)
Accuracy
AI: 1 Human: 1
AI21 • Self-reported
License & Metadata
License
llama_3_2_community_license
Announcement Date
September 25, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsLlama 3.2 90B Instruct
Meta
MM90.0B
Best score:0.9 (MMLU)
Released:Sep 2024
Price:$1.20/1M tokens
Llama 4 Maverick
Meta
MM400.0B
Best score:0.9 (MMLU)
Released:Apr 2025
Price:$0.27/1M tokens
Llama 4 Scout
Meta
MM109.0B
Best score:0.8 (MMLU)
Released:Apr 2025
Price:$0.18/1M tokens
Llama 3.1 70B Instruct
Meta
70.0B
Best score:0.9 (ARC)
Released:Jul 2024
Price:$0.89/1M tokens
DeepSeek VL2
DeepSeek
MM27.0B
Released:Dec 2024
Price:$9.50/1M tokens
DeepSeek VL2 Small
DeepSeek
MM16.0B
Released:Dec 2024
Llama 3.3 70B Instruct
Meta
70.0B
Best score:0.9 (HumanEval)
Released:Dec 2024
Price:$0.88/1M tokens
GPT OSS 20B
OpenAI
MM20.0B
Best score:0.9 (MMLU)
Released:Aug 2025
Price:$0.10/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.