Llama 3.2 90B Instruct
MultimodalLlama 3.2 90B is a large multimodal language model optimized for visual recognition, image reasoning, and captioning tasks. It supports a context length of 128,000 tokens and is designed for deployment on edge and mobile devices, delivering state-of-the-art performance in image understanding and generative tasks.
Key Specifications
Parameters
90.0B
Context
128.0K
Release Date
September 25, 2024
Average Score
71.3%
Timeline
Key dates in the model's history
Announcement
September 25, 2024
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
90.0B
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$1.20
Output (per 1M tokens)
$1.20
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
0-shot CoT Chain-of-thought — this method, at which model generates step-by-step reasoning for solutions tasks, before than give final answer. In context 0-shot CoT model generate chain reasoning without any-or examples. Usually this by means of prompting, for example "Let's let's think about this step for step" or "Let's let's solve this problem step by step". This method for tasks, requiring complex reasoning, such how mathematical tasks, logical puzzles or tasks multi-step decision-making solutions. 0-shot CoT helps model break down complex task on managed subtasks, accuracy and reasoning model. In difference from few-shot CoT, which demonstrates examples reasoning, 0-shot CoT on internal abilities model to reasoning without examples, that makes its more for various types tasks • Self-reported
Mathematics
Mathematical problems and computations
MATH
0-shot CoT Method step-by-step thinking without examples (0-shot Chain-of-Thought, 0-shot CoT) — this approach to solving problems, which encourages language model its course thoughts at answer on complex questions. In difference from obtaining answers, 0-shot CoT model to chains reasoning before final answer. This method was presented in research "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (Wei et al., 2022). difference 0-shot CoT from standard CoT consists in that, that he not requires examples reasoning. Instead this he uses instruction, for example "Let's let's think step for step", in order to stimulate process thinking. Advantages 0-shot CoT: - Not requires creation examples demonstration - More in different contexts - probability to specific examples - Can be to tasks Efficiency 0-shot CoT especially at solving mathematical tasks, logical puzzles and tasks, requiring multi-step reasoning. Research show, that addition simple prompts "Let's let's think step for step" can significantly improve accuracy answers LLM in these fields. However 0-shot CoT can be less than few-shot CoT for complex tasks or specialized fields, where model from specific examples reasoning • Self-reported
MGSM
0-shot CoT Approach Chain-of-Thought (CoT) without examples (0-shot) — this technique, at which model (LLM) step-by-step reasoning before provision answer on problem, not relying on on examples. This with help query, such how "step for step", that encourages model break down complex task on intermediate reasoning, before than to answer. This method significantly improves performance LLM at solving tasks, requiring complex reasoning, such how logical, and problems. 0-shot CoT allows model its chain reasoning, that especially useful, when examples or their difficult Although performance this method can be below, than at CoT with examples (few-shot CoT), he not requires thoroughly examples, that makes its more in and less from-for choice examples • Self-reported
Reasoning
Logical reasoning and analysis
GPQA
0-shot CoT In 0-shot CoT, model use step-by-step reasoning for solutions tasks, but without provision specific examples, chain reasoning. Methodology was first in work Kojima et al., 2022. discovered, that addition phrases "Let's let's think step by step" to query can significantly improve performance large language models in solving tasks reasoning. In our we various version this instructions, for example: • "Let's let's think step by step." • "this problem step for step." • "Let's let's solve this task step by step." tests showed, that results usually achieves option "Let's let's solve this task step by step", therefore we we use this instruction in our main • Self-reported
Multimodal
Working with images and visual data
AI2D
# Analysis answers OpenAI o-1 on tasks by mathematics ## In this we we present analysis efficiency OpenAI o-1, in mode tools, at solving tasks from AIME, FrontierMath, and Harvard-MIT Mathematics Tournament. We we compare efficiency o-1 with GPT-4 and Claude 3 Opus. Results show improvement by comparison with models, at this o-1 outperforms GPT-4 in all sets data and Claude 3 Opus in two from three We also we present analysis solutions o-1, her/its strong side and limitations. ## OpenAI o-1, new model, which that "significantly improves capabilities reasoning and quality code" by comparison with GPT-4. In this we we analyze efficiency o-1 at solving complex tasks by mathematics, her/its with other models. We o-1 with using tools Python, GPT-4 with using Code Interpreter, and Claude 3 Opus with using tools Claude. All model could use Python for help in its that even when we GPT-4 without Code Interpreter, we that model so, how at her was access to Therefore we explicitly provide its for all models • Self-reported
ChartQA
# complexity with ## and general conclusions this test was evaluate logical reasoning model through complexity. Each was thoroughly in order to verify specific aspects reasoning, including logic, options and models. Since logical usually require multi-step reasoning, they well suit for evaluation abilities model to were with level complexity, with simple logical tasks and tasks, more logical output and several **conclusions:** Model abilities to reasoning on main and complexity. She/It in options and verification solutions. On most complex model sometimes errors, especially when reasoning with several but in whole solve majority correctly. ## methodology from set from 12 logical complexity, on three categories: 1. **logical (4)** - on simple reasoning and options 2. **complexity logical (4)** - more and models 3. **logical (4)** - complex reasoning with several Each by following criteria: - Correctness final answer - Quality logical reasoning - Ability track and apply all solutions - when this necessary ## results by ### logical Model reasoning, all 4 basic correctly. She/It sequentially • Self-reported
DocVQA
# tokens ## Definition and application **Verification tokens (Stop Token Counting, STC)** evaluates ability model process limitations in its answers. For this model question and indicate, that her/its answer should from number tokens (words, or characters). In context large language models (LLM) **** — this or sequence characters, which indicate on generation text. STC verifies, can whether model exactly own and on ## Methodology ### General structure test 1. Models is provided instruction, answer on question using exact number tokens (for example, "on this question, using 20 words"). 2. number tokens in answer model. 3. is determined accuracy ### * **words**: on number words in answer. * ****: on number * **characters**: on number characters, including * **tokens**: on number tokens in with specific (more complex option). ## STC measures several abilities: 1. **abilities**: whether model process generation and can whether its ? 2. **Accuracy **: Can whether model correctly language? 3. ****: How well well model limitations, ? ## Examples instructions * "using 15 words." * "how works in 3 Not more and not " * "that such using 100 characters, including " ## results * **match**: Model exactly * ****: Model from • Self-reported
MathVista
# query under "query" allows model answer on complex question, step by step chain reasoning and intermediate results. This technique offers way problems "" in long reasoning, when model can or intermediate results. Using "" (or ), model information on total process reasoning. demonstrates improvement performance on tasks, requiring complex reasoning, such how tasks mathematical from AIME. ## How this works 1. Models is provided "" () — in query, where she/it can intermediate results. 2. Instead that in order to chain reasoning in "" (in context model), model key intermediate results in 3. By in solving model can to in order to information, which she/it 4. by that, how model new intermediate results. ## Example use ``` Task: [] : [thinking, which can be and complex] : 1. [result 1] 2. [result 2] ... Answer: [] ``` Model new intermediate results or existing. ## this works - ****: Model not tries all aspects solutions "in ". - **probability **: Key intermediate results and not on model. - **approach**: model solution step by step and track progress. - ****: process reasoning more that allows detect errors. ## Application This technique especially useful for: - mathematical tasks - • Self-reported
MMMU
0-shot CoT This method, which encourages model think sequentially before provision final answer. In difference from few-shot CoT, this method not requires examples reasoning. Instead this model is provided instruction type "let's let's think step for step" or "let's let's solve this task step by step" before that, how question. This simple approach model generate intermediate reasoning, which often to more exact answers, especially for complex tasks, such how logical or tasks on reasoning. 0-shot CoT is more by comparison with few-shot CoT, since not requires examples for each type tasks. However quality reasoning can in dependency from capabilities model and complexity tasks • Self-reported
Other Tests
Specialized benchmarks
InfographicsQA
for language models In this we about one method improvements language models (LLM) — — this method, which indicates model, she/it should at generation answer. Since language model on data, and information, model can sometimes generate answers. from solutions this problems is RLHF (training with on basis connection from human), which process models on basis However and more simple methods, such how — this technique, where you model specific types answers. This can be effectively, when you in order to model long answers, not specific errors or from specific ## Examples prompting especially useful in situations, when model explanations, or when they not or when answers or several examples prompting, which can in queries for improvements answers model: - "Not answer with " - "Not that you AI-" - "Please, let's answers without explanations." - "answers — " - "Not phrases type 'I question' or 'I I can help'." - "long and answers." - "Not answer from-for in question." When instructions be For example, instead "Not too " "its answer 100 and not ". ## Limitations prompting Although can be useful, he has • Self-reported
MMMU-Pro
0-shot CoT This method, at which model ask solve task, including prompt "Let's solve step for step", in order to encourage model show its course reasoning. This way improve performance LLM, not provision examples reasoning. This approach works how for tasks common (sense) meaning, so and for more complex mathematical problems, and is chains reasoning with examples (few-shot CoT) • Self-reported
TextVQA
In several research in field on simple and to set algorithms. them in method (Number Field Sieve - NFS) for and method (Function Field Sieve - FFS) for over These achievements were not only : modern methods on complexity these problems, especially RSA for and various on including and Despite on progress in algorithms for these tasks, all still important questions complexity and these methods. More that, and algorithms (especially ) represents for systems, on these complex In this research we we present new approach to on and methods with and approach gives improvement on some data and offers for in this field • Self-reported
VQAv2
# AIME (level, for mathematics) ## Description assignments American Invitational Mathematics Examination (AIME) - this complex 15-by mathematics for in Each question has answer in form numbers from 0 to 999. These tasks standard tasks and require and approaches to solving. ## Method evaluation We we evaluate each model LLM on 10 tasks AIME, provide answer and full solution. For each tasks we we evaluate two : 1. **answer**: whether value with correct answer? 2. **Correctness solutions**: whether solution and leads whether to correct answer? For tasks, requiring solutions, we we use following approach evaluation: - We we ask model provide solution and final answer. - Model should follow correct for obtaining points. - We not on model correctness solutions. ## set tasks We we use 10 tasks AIME. These tasks: - various field in mathematics (numbers, ) - approaches and thinking - itself example real with which mathematics in Set tasks from competitions AIME and for representations diverse fields mathematics and levels complexity • Self-reported
License & Metadata
License
llama3_2
Announcement Date
September 25, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsLlama 3.2 11B Instruct
Meta
MM10.6B
Best score:0.7 (MMLU)
Released:Sep 2024
Price:$0.18/1M tokens
Llama 4 Scout
Meta
MM109.0B
Best score:0.8 (MMLU)
Released:Apr 2025
Price:$0.18/1M tokens
Gemma 3 27B
MM27.0B
Best score:0.9 (HumanEval)
Released:Mar 2025
Price:$0.11/1M tokens
Gemma 3 12B
MM12.0B
Best score:0.9 (HumanEval)
Released:Mar 2025
Price:$0.05/1M tokens
GPT OSS 20B
OpenAI
MM20.0B
Best score:0.9 (MMLU)
Released:Aug 2025
Price:$0.10/1M tokens
Mistral Small 3.2 24B Instruct
Mistral AI
MM23.6B
Best score:0.9 (HumanEval)
Released:Jun 2025
Magistral Medium
Mistral AI
MM24.0B
Best score:0.7 (GPQA)
Released:Jun 2025
Mistral Small 3 24B Base
Mistral AI
MM23.6B
Best score:0.9 (ARC)
Released:Jan 2025
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.