Llama 4 Maverick

Name: Llama 4 Maverick
Author: Meta

Multimodal

Meta

Llama 4 Maverick is a natively multimodal model capable of processing both text and images. It uses a Mixture-of-Experts (MoE) architecture with 17 billion active parameters and 128 experts, supporting a wide range of multimodal tasks such as conversational interaction, image analysis, and code generation. The model features a 1 million token context window.

Key Specifications

Parameters

400.0B

Context

1.0M

Release Date

April 5, 2025

Average Score

71.8%

API Documentation Repository Model Weights

Timeline

Key dates in the model's history

Announcement

April 5, 2025

Last Update

July 19, 2025

Today

May 9, 2026

Technical Specifications

Parameters

400.0B

Training Tokens

22.0T tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$0.27

Output (per 1M tokens)

$0.85

Max Input Tokens

1.0M

Max Output Tokens

1.0M

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

MMLU

5-shot macro_avg/acc_char AI: 5-shot macro_avg/acc_char • Self-reported

85.5%

Programming

Programming skills tests

MBPP

3-shot pass@1 For evaluation model I "on first attempt" (3-shot pass@1). This method ability model sequentially solve tasks, using following process: 1. model are provided 3 example solutions for specific type tasks. 2. Then model task that indeed type. 3. Model should correctly solve task with first attempts, without iterations or This method especially useful for evaluation abilities model to training by examples and knowledge. He also demonstrates, how well well model can follow reasoning. 3-shot pass@1 is since requires success with first attempts, that better matches use, where users usually not for obtaining correct answer • Self-reported

77.6%

Mathematics

Mathematical problems and computations

MATH

4-shot em_maj1@1 whether method answers for each example in set data, this 4 times for set prompts, and then we choose for example which not less two times. For set data from m examples metric proportion (1.0 — if all m examples correctly ). If method not nor one evaluation for example, we we consider, that model not for this example • Self-reported

61.2%

MGSM

0-shot CoT This method encourages model generate step-by-step reasoning before answer, but without necessity this in examples. This with help query. For example, instead query: "will value expressions: 536 - 317?" we we can : "will value expressions: 536 - 317? Let's solve step for step". "Let's solve step for step" or "Let's this" encourages model generate chain reasoning, which leads to more exact answers. Chain reasoning allows model break down complex task on more simple components and sequentially their solve. This usually performance on tasks, requiring several steps thinking. that, step-by-step reasoning helps understand, how model to answer, reasoning more This method especially efficient for logical and tasks, requiring multi-step reasoning, and can be to tasks without necessity which-or demonstration • Self-reported

92.3%

Reasoning

Logical reasoning and analysis

GPQA

0-shot CoT AI: *this should * In 0-shot CoT we prompt "Let us reason step for step" to query. : research showed, that addition phrases "Let us reason step for step" can improve reasoning model at answer on complex questions. This encourages model think more sequentially and that often leads to accuracy. Although this technique she/it especially in tasks, requiring solutions • Self-reported

69.8%

Multimodal

Working with images and visual data

ChartQA

0-shot CoT Method Chain-of-Thought ("chain reasoning") without examples, when model independently reasoning, often how "Let's let's think step for step" or "Let's let's solve this task sequentially" • Self-reported

90.0%

DocVQA

0-shot CoT reasoning through intermediate steps. solution tasks on intermediate steps. can be more or less detailed. model use step-by-step reasoning. 0-shot CoT simply generates step-by-step reasoning without instructions, for example in answer on prompt "Question: [question]?". LLM manner solution on steps. step-by-step reasoning by itself, especially in more models LLM, when task: - complex, how in mathematical tasks - requires several steps requires Efficiency this method usually below, than at methods with query on step-by-step reasoning (for example, k-shot CoT or Zero-shot-CoT) • Self-reported

94.4%

MathVista

0-shot CoT Zero-shot Chain-of-Thought (0-shot CoT) — this method output, at which model to solving problems step for step, but without demonstration examples such step-by-step reasoning. This with help simple prompts, such how "Let's let's solve this step for step" or "Let's let's think about this", which in query. prompt significantly improves performance model by comparison with (query without additional instructions), model intermediate reasoning before answer. When model first its reasoning, she/it often achieves more high accuracy, especially in tasks, requiring complex computations or analysis. 0-shot CoT especially useful in those cases, when at us no capabilities or provide examples for demonstration, how at few-shot CoT • Self-reported

73.7%

MMMU

Reasoning with 0-shot CoT encourages model language generate chain reasoning, relying on only on instruction and not using examples. can be for example: "Let's let's think step for step". This model more explicitly reason about before than provide final answer, and leads to more high accuracy by comparison with direct answer. Reasoning with especially useful, when: 1. Examples reasoning difficult provide or their provision can 2. solutions too in order to its can was examples 3. Task requires approaches to reasoning This method was first presented Wei et al. (2022) and its efficiency in tasks, tasks common (sense) meaning and reasoning • Self-reported

73.4%

Other Tests

Specialized benchmarks

LiveCodeBench

0-shot CoT Application prompts to LLM for answer with without examples. Method consists in phrases "Let's let's think step for step" (or ) in end query. This encourages model use more process reasoning, complex task on sequence steps. In difference from few-shot CoT, which requires demonstration examples with reasoning, 0-shot CoT not requires examples. He especially useful for tasks, requiring reasoning, such how tasks, logical puzzles and tasks reasoning about This method, first in and (2022), significantly improves performance on tasks, requiring reasoning, simply prompt, which encourages model think more • Self-reported

43.4%

MMLU-Pro

0-shot CoT 0-shot Chain-of-Thought (chain reasoning without examples) — this method, at which LLM to reasoning step for step before final answer, but without provision examples that, how perform reasoning. Usually this by means of to query such how "Let's let's solve this step for step" or "Let's let's think about this ". In our we used following template query: ``` [Question] Let's let's solve this step for step. ``` This method represents itself step between approach (simply question) and more complex methods, which include examples or instructions • Self-reported

80.5%

MMMU-Pro

0-shot CoT Zero-shot Chain-of-Thought (0-shot CoT) — this method, which encourages LLM "think step for step" at answer on question, not showing specific examples such thinking. In difference from standard prompts, which simply answer, and few-shot CoT, which shows examples step-by-step reasoning, 0-shot CoT contains only prompt "let's reason step for step" before or after question. Key : - Without examples: Not requires provision examples step-by-step reasoning - in : prompt for improvements results - : improvement on tasks reasoning, although and not such how few-shot CoT Limitations: - efficient, than few-shot CoT, especially on complex tasks - Not always leads to correct reasoning or answer - Efficiency depends from specific tasks and model prompts: - "Let's reason step for step" - "Let's work over this step by step" - "Let's let's solve this task step for step" justification: in Kojima et al. (2022) "Large Language Models are Zero-Shot Reasoners", which that prompt can substantially improve ability model to reasoning without necessity in examples • Self-reported

59.6%

TydiQA

1-shot average/f1 • Self-reported

31.7%

License & Metadata

License

llama_4_community_license_agreement

Announcement Date

April 5, 2025

Last Updated

July 19, 2025

Articles about Llama 4 Maverick

The Best GPU for Local AI in 2026 Costs $650 — And It's from 2020

Used RTX 3090 prices have cratered to $650 while RTX 5090s sell for $3,500. For the local LLM community, old hardware has never made more sense.

March 26, 2026

6 min

The Two Loops: How China's Open-Source AI Strategy Is Outpacing America

A new USCC report warns that China's open AI models now dominate global downloads. 80% of US startups use Chinese models. Washington is scrambling.

March 25, 2026

9 min

Unsloth Studio Wants to Be the IDE for Local AI — Training Included

The open-source tool combines inference and fine-tuning in one interface, with 70% less VRAM and no-code training for 500+ models. LM Studio should be nervous.

March 25, 2026

6 min

Similar Models

All Models

Llama 4 Scout

Meta

MM109.0B

Best score:0.8 (MMLU)

Released:Apr 2025

Price:$0.18/1M tokens

Llama 3.1 405B Instruct

Meta

405.0B

Best score:1.0 (ARC)

Released:Jul 2024

Price:$3.50/1M tokens

GPT OSS 120B

OpenAI

MM120.0B

Best score:0.9 (MMLU)

Released:Aug 2025

Price:$0.15/1M tokens

Llama 3.2 90B Instruct

Meta

MM90.0B

Best score:0.9 (MMLU)

Released:Sep 2024

Price:$1.20/1M tokens

Llama 3.2 11B Instruct

Meta

MM10.6B

Best score:0.7 (MMLU)

Released:Sep 2024

Price:$0.18/1M tokens

GLM-4.6

Zhipu AI

MM357.0B

Best score:0.8 (GPQA)

Released:Sep 2025

Price:$0.60/1M tokens

Pixtral Large

Mistral AI

MM124.0B

Released:Nov 2024

Price:$2.00/1M tokens

Step-3.5-Flash

StepFun

MM196.0B

Best score:0.9 (TAU)

Released:Feb 2026

Price:$0.10/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.