Meta logo

Llama 4 Maverick

Multimodal
Meta

Llama 4 Maverick is a natively multimodal model capable of processing both text and images. It uses a Mixture-of-Experts (MoE) architecture with 17 billion active parameters and 128 experts, supporting a wide range of multimodal tasks such as conversational interaction, image analysis, and code generation. The model features a 1 million token context window.

Key Specifications

Parameters
400.0B
Context
1.0M
Release Date
April 5, 2025
Average Score
71.8%

Timeline

Key dates in the model's history
Announcement
April 5, 2025
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
400.0B
Training Tokens
22.0T tokens
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$0.27
Output (per 1M tokens)
$0.85
Max Input Tokens
1.0M
Max Output Tokens
1.0M
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
MMLU
5-shot macro_avg/acc_char AI: 5-shot macro_avg/acc_charSelf-reported
85.5%

Programming

Programming skills tests
MBPP
3-shot pass@1 For evaluation model I "on first attempt" (3-shot pass@1). This method ability model sequentially solve tasks, using following process: 1. model are provided 3 example solutions for specific type tasks. 2. Then model task that indeed type. 3. Model should correctly solve task with first attempts, without iterations or This method especially useful for evaluation abilities model to training by examples and knowledge. He also demonstrates, how well well model can follow reasoning. 3-shot pass@1 is since requires success with first attempts, that better matches use, where users usually not for obtaining correct answerSelf-reported
77.6%

Mathematics

Mathematical problems and computations
MATH
4-shot em_maj1@1 whether method answers for each example in set data, this 4 times for set prompts, and then we choose for example which not less two times. For set data from m examples metric proportion (1.0 — if all m examples correctly ). If method not nor one evaluation for example, we we consider, that model not for this exampleSelf-reported
61.2%
MGSM
0-shot CoT This method encourages model generate step-by-step reasoning before answer, but without necessity this in examples. This with help query. For example, instead query: "will value expressions: 536 - 317?" we we can : "will value expressions: 536 - 317? Let's solve step for step". "Let's solve step for step" or "Let's this" encourages model generate chain reasoning, which leads to more exact answers. Chain reasoning allows model break down complex task on more simple components and sequentially their solve. This usually performance on tasks, requiring several steps thinking. that, step-by-step reasoning helps understand, how model to answer, reasoning more This method especially efficient for logical and tasks, requiring multi-step reasoning, and can be to tasks without necessity which-or demonstrationSelf-reported
92.3%

Reasoning

Logical reasoning and analysis
GPQA
0-shot CoT AI: *this should * In 0-shot CoT we prompt "Let us reason step for step" to query. : research showed, that addition phrases "Let us reason step for step" can improve reasoning model at answer on complex questions. This encourages model think more sequentially and that often leads to accuracy. Although this technique she/it especially in tasks, requiring solutionsSelf-reported
69.8%

Multimodal

Working with images and visual data
ChartQA
0-shot CoT Method Chain-of-Thought ("chain reasoning") without examples, when model independently reasoning, often how "Let's let's think step for step" or "Let's let's solve this task sequentially"Self-reported
90.0%
DocVQA
0-shot CoT reasoning through intermediate steps. solution tasks on intermediate steps. can be more or less detailed. model use step-by-step reasoning. 0-shot CoT simply generates step-by-step reasoning without instructions, for example in answer on prompt "Question: [question]?". LLM manner solution on steps. step-by-step reasoning by itself, especially in more models LLM, when task: - complex, how in mathematical tasks - requires several steps requires Efficiency this method usually below, than at methods with query on step-by-step reasoning (for example, k-shot CoT or Zero-shot-CoT)Self-reported
94.4%
MathVista
0-shot CoT Zero-shot Chain-of-Thought (0-shot CoT) — this method output, at which model to solving problems step for step, but without demonstration examples such step-by-step reasoning. This with help simple prompts, such how "Let's let's solve this step for step" or "Let's let's think about this", which in query. prompt significantly improves performance model by comparison with (query without additional instructions), model intermediate reasoning before answer. When model first its reasoning, she/it often achieves more high accuracy, especially in tasks, requiring complex computations or analysis. 0-shot CoT especially useful in those cases, when at us no capabilities or provide examples for demonstration, how at few-shot CoTSelf-reported
73.7%
MMMU
Reasoning with 0-shot CoT encourages model language generate chain reasoning, relying on only on instruction and not using examples. can be for example: "Let's let's think step for step". This model more explicitly reason about before than provide final answer, and leads to more high accuracy by comparison with direct answer. Reasoning with especially useful, when: 1. Examples reasoning difficult provide or their provision can 2. solutions too in order to its can was examples 3. Task requires approaches to reasoning This method was first presented Wei et al. (2022) and its efficiency in tasks, tasks common (sense) meaning and reasoningSelf-reported
73.4%

Other Tests

Specialized benchmarks
LiveCodeBench
0-shot CoT Application prompts to LLM for answer with without examples. Method consists in phrases "Let's let's think step for step" (or ) in end query. This encourages model use more process reasoning, complex task on sequence steps. In difference from few-shot CoT, which requires demonstration examples with reasoning, 0-shot CoT not requires examples. He especially useful for tasks, requiring reasoning, such how tasks, logical puzzles and tasks reasoning about This method, first in and (2022), significantly improves performance on tasks, requiring reasoning, simply prompt, which encourages model think moreSelf-reported
43.4%
MMLU-Pro
0-shot CoT 0-shot Chain-of-Thought (chain reasoning without examples) — this method, at which LLM to reasoning step for step before final answer, but without provision examples that, how perform reasoning. Usually this by means of to query such how "Let's let's solve this step for step" or "Let's let's think about this ". In our we used following template query: ``` [Question] Let's let's solve this step for step. ``` This method represents itself step between approach (simply question) and more complex methods, which include examples or instructionsSelf-reported
80.5%
MMMU-Pro
0-shot CoT Zero-shot Chain-of-Thought (0-shot CoT) — this method, which encourages LLM "think step for step" at answer on question, not showing specific examples such thinking. In difference from standard prompts, which simply answer, and few-shot CoT, which shows examples step-by-step reasoning, 0-shot CoT contains only prompt "let's reason step for step" before or after question. Key : - Without examples: Not requires provision examples step-by-step reasoning - in : prompt for improvements results - : improvement on tasks reasoning, although and not such how few-shot CoT Limitations: - efficient, than few-shot CoT, especially on complex tasks - Not always leads to correct reasoning or answer - Efficiency depends from specific tasks and model prompts: - "Let's reason step for step" - "Let's work over this step by step" - "Let's let's solve this task step for step" justification: in Kojima et al. (2022) "Large Language Models are Zero-Shot Reasoners", which that prompt can substantially improve ability model to reasoning without necessity in examplesSelf-reported
59.6%
TydiQA
1-shot average/f1Self-reported
31.7%

License & Metadata

License
llama_4_community_license_agreement
Announcement Date
April 5, 2025
Last Updated
July 19, 2025

Articles about Llama 4 Maverick

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.