Meta logo

Llama 3.2 11B Instruct

Multimodal
Meta

Llama 3.2 11B Vision Instruct is an instruction-tuned multimodal large language model optimized for visual recognition, image analysis, captioning, and answering general questions about images. The model accepts text and images as input and generates text output.

Key Specifications

Parameters
10.6B
Context
128.0K
Release Date
September 25, 2024
Average Score
63.6%

Timeline

Key dates in the model's history
Announcement
September 25, 2024
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
10.6B
Training Tokens
-
Knowledge Cutoff
December 31, 2023
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$0.18
Output (per 1M tokens)
$0.18
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
MMLU
accuracySelf-reported
73.0%

Mathematics

Mathematical problems and computations
MATH
0-shot, CoT This basic option method "chains reasoning" (Chain of Thought), without additional examples. We model think step by step, but not we provide examples that, how this do. We we use such instructions, how "Let's let's solve this task step by step" or "Let's let's think about this step for step", in order to more reasoning. This method offers model create chain reasoning, to answer, instead that in order to generate answer directly. This approach especially useful for complex tasks, requiring multi-step reasoning, such how mathematical tasks or logical puzzles, where intermediate steps can have value for obtaining correct answerSelf-reported
51.9%
MGSM
0-shot, CoT reasoning without examples. In this method we model solve task, using step-by-step reasoning, but not examples that, how this should look. Usually this with help prompts "about this step for step". This method often is used in research large language models, since he and efficient. Although model not receives step-by-step solutions, modern model usually generate its independentlySelf-reported
68.9%

Reasoning

Logical reasoning and analysis
GPQA
CoT Approach with using chains thinking (Chain-of-Thought, CoT) — this method, at which we at model answer on task without provision examples that, how solve tasks. Instead this we we ask model "think step for step", in order to she/it could structure its thoughts and break down complex task on more simple parts. For example, at use standard approach we simply : "17 × 28". When use approach with chain thinking we : "17 × 28. Let us think step for step." Research showed, that simple addition phrases "Let us think step for step" to query significantly improves ability model solve tasks, requiring several steps reasoningSelf-reported
32.8%

Multimodal

Working with images and visual data
AI2D
Accuracy test AI: I'll translate the technical text about AI model analysis. Test accuracySelf-reported
91.1%
ChartQA
Test, 0-shot CoT relaxed accuracySelf-reported
83.4%
DocVQA
ANLS AI: I'll translate this technical text about model analysis methods according to your requirements. ## ANLS ANLS (Average Normalized Levenshtein Similarity) - this metric evaluation for tasks extraction text from images, which measures between and : 1. For each and text is calculated 2. This by means of on more 3. in : 1 - _4. ANLS - this average value these by all examples ANLS is used in tasks OCR, text in and extraction information from visual data. ANLS from 0 to 1, where 1 means match between and textSelf-reported
88.4%
MathVista
Accuracy on AI: 5 points to evaluate AI performance on Machine Learning benchmarks. 1. Accuracy and metric definition. 2. Train-test split: Is there a clear split? Is test data truly unseen? 3. Memorization risk: Could the model have seen test examples during pretraining? 4. Benchmark staleness: How widely known is the benchmark? Has it been used to optimize model performance? 5. Problem difficulty: Is the benchmark challenging enough to differentiate model capabilities?Self-reported
51.5%
MMMU
Val, 0-shot CoT, accuracySelf-reported
50.7%

Other Tests

Specialized benchmarks
MMMU-Pro
Accuracy testing AI: in training measure that, how well exactly model makes on data, which she/it not in time training. Usually is calculated how percentage correct from general numbers on test set dataSelf-reported
33.0%
VQAv2 (test)
Accuracy AI: 1 Human: 1 AI21Self-reported
75.2%

License & Metadata

License
llama_3_2_community_license
Announcement Date
September 25, 2024
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.