Llama 3.2 11B Instruct

Name: Llama 3.2 11B Instruct
Author: Meta

Multimodal

Meta

Llama 3.2 11B Vision Instruct is an instruction-tuned multimodal large language model optimized for visual recognition, image analysis, captioning, and answering general questions about images. The model accepts text and images as input and generates text output.

Key Specifications

Parameters

10.6B

Context

128.0K

Release Date

September 25, 2024

Average Score

63.6%

API Documentation Repository Model Weights Results Blog

Timeline

Key dates in the model's history

Announcement

September 25, 2024

Last Update

July 19, 2025

Today

May 9, 2026

Technical Specifications

Parameters

10.6B

Training Tokens

Knowledge Cutoff

December 31, 2023

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$0.18

Output (per 1M tokens)

$0.18

Max Input Tokens

128.0K

Max Output Tokens

128.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

MMLU

accuracy • Self-reported

73.0%

Mathematics

Mathematical problems and computations

MATH

0-shot, CoT This basic option method "chains reasoning" (Chain of Thought), without additional examples. We model think step by step, but not we provide examples that, how this do. We we use such instructions, how "Let's let's solve this task step by step" or "Let's let's think about this step for step", in order to more reasoning. This method offers model create chain reasoning, to answer, instead that in order to generate answer directly. This approach especially useful for complex tasks, requiring multi-step reasoning, such how mathematical tasks or logical puzzles, where intermediate steps can have value for obtaining correct answer • Self-reported

51.9%

MGSM

0-shot, CoT reasoning without examples. In this method we model solve task, using step-by-step reasoning, but not examples that, how this should look. Usually this with help prompts "about this step for step". This method often is used in research large language models, since he and efficient. Although model not receives step-by-step solutions, modern model usually generate its independently • Self-reported

68.9%

Reasoning

Logical reasoning and analysis

GPQA

CoT Approach with using chains thinking (Chain-of-Thought, CoT) — this method, at which we at model answer on task without provision examples that, how solve tasks. Instead this we we ask model "think step for step", in order to she/it could structure its thoughts and break down complex task on more simple parts. For example, at use standard approach we simply : "17 × 28". When use approach with chain thinking we : "17 × 28. Let us think step for step." Research showed, that simple addition phrases "Let us think step for step" to query significantly improves ability model solve tasks, requiring several steps reasoning • Self-reported

32.8%

Multimodal

Working with images and visual data

AI2D

Accuracy test AI: I'll translate the technical text about AI model analysis. Test accuracy • Self-reported

91.1%

ChartQA

Test, 0-shot CoT relaxed accuracy • Self-reported

83.4%

DocVQA

ANLS AI: I'll translate this technical text about model analysis methods according to your requirements. ## ANLS ANLS (Average Normalized Levenshtein Similarity) - this metric evaluation for tasks extraction text from images, which measures between and : 1. For each and text is calculated 2. This by means of on more 3. in : 1 - _4. ANLS - this average value these by all examples ANLS is used in tasks OCR, text in and extraction information from visual data. ANLS from 0 to 1, where 1 means match between and text • Self-reported

88.4%

MathVista

Accuracy on AI: 5 points to evaluate AI performance on Machine Learning benchmarks. 1. Accuracy and metric definition. 2. Train-test split: Is there a clear split? Is test data truly unseen? 3. Memorization risk: Could the model have seen test examples during pretraining? 4. Benchmark staleness: How widely known is the benchmark? Has it been used to optimize model performance? 5. Problem difficulty: Is the benchmark challenging enough to differentiate model capabilities? • Self-reported

51.5%

MMMU

Val, 0-shot CoT, accuracy • Self-reported

50.7%

Other Tests

Specialized benchmarks

MMMU-Pro

Accuracy testing AI: in training measure that, how well exactly model makes on data, which she/it not in time training. Usually is calculated how percentage correct from general numbers on test set data • Self-reported

33.0%

VQAv2 (test)

Accuracy AI: 1 Human: 1 AI21 • Self-reported

75.2%

License & Metadata

License

llama_3_2_community_license

Announcement Date

September 25, 2024

Last Updated

July 19, 2025

Similar Models

All Models

Llama 3.2 90B Instruct

Meta

MM90.0B

Best score:0.9 (MMLU)

Released:Sep 2024

Price:$1.20/1M tokens

Llama 4 Maverick

Meta

MM400.0B

Best score:0.9 (MMLU)

Released:Apr 2025

Price:$0.27/1M tokens

Llama 4 Scout

Meta

MM109.0B

Best score:0.8 (MMLU)

Released:Apr 2025

Price:$0.18/1M tokens

Llama 3.1 70B Instruct

Meta

70.0B

Best score:0.9 (ARC)

Released:Jul 2024

Price:$0.89/1M tokens

DeepSeek VL2

DeepSeek

MM27.0B

Released:Dec 2024

Price:$9.50/1M tokens

DeepSeek VL2 Small

DeepSeek

MM16.0B

Released:Dec 2024

Llama 3.3 70B Instruct

Meta

70.0B

Best score:0.9 (HumanEval)

Released:Dec 2024

Price:$0.88/1M tokens

GPT OSS 20B

OpenAI

MM20.0B

Best score:0.9 (MMLU)

Released:Aug 2025

Price:$0.10/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.