Gemma 3n E4B Instructed

Name: Gemma 3n E4B Instructed
Author: Google

Multimodal

Google

Gemma 3n is a multimodal model designed for local hardware deployment, supporting image, text, audio, and video inputs. It includes a language decoder, audio encoder, and image encoder and is available in two sizes: E2B and E4B. The model is optimized for efficient memory usage, allowing it to run on devices with limited GPU RAM. Gemma is a family of lightweight, state-of-the-art open models from Google, built on the same research and technology used to create Gemini models. Gemma models are well-suited for various content understanding tasks, including question answering, summarization, and reasoning. Their relatively small size enables deployment in resource-constrained environments such as laptops, desktops, or private cloud infrastructure, democratizing access to state-of-the-art AI models and fostering innovation for everyone. Gemma 3n models are designed for efficient execution on resource-constrained devices. They can process multimodal inputs, working with text, images, video, and audio, and generate text outputs, with open weights for instruction-tuned variants. These models were trained on data in over 140 spoken languages.

Key Specifications

Parameters

8.0B

Context

32.0K

Release Date

June 26, 2025

Average Score

42.0%

API Documentation Model Weights Results Blog

Timeline

Key dates in the model's history

Announcement

June 26, 2025

Last Update

July 19, 2025

Today

May 9, 2026

Technical Specifications

Parameters

8.0B

Training Tokens

11.0T tokens

Knowledge Cutoff

June 1, 2024

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$20.00

Output (per 1M tokens)

$40.00

Max Input Tokens

32.0K

Max Output Tokens

32.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

MMLU

Accuracy. 0-shot. • Self-reported

64.9%

Programming

Programming skills tests

HumanEval

pass@1. 0-shot. • Self-reported

75.0%

MBPP

# pass@1. 3-shot. Pass with first attempts. 3 example. • Self-reported

63.6%

Mathematics

Mathematical problems and computations

MGSM

0-shot Model performs assignment without training examples. This basic where we we provide only instructions or context, and model should immediately generate answer. For example, in case assignments by generation code we simply "me function, which simple numbers", without provision example that, how should look code • Self-reported

67.0%

Reasoning

Logical reasoning and analysis

GPQA

# Diamond. 0-shot We we present Diamond (Decomposing Multi-Objective eNDs), method for solutions complex problems with help LLM. Diamond uses three strategies: 1. ****: problems on and their solution by 2. ****: Computation solutions for verification correctness approach. 3. ****: Verification with help answer and reasoning. Diamond especially efficient for tasks with results. We we measure performance Diamond with help (our solutions) and (our solutions). ## We Diamond in two : 1. For answers on questions by mathematical (IMO, AIME). 2. For machine training, including for ## Diamond can be with various models, including GPT-4, GPT-4o, Claude-3-Opus and Claude-3-Sonnet. For our example Diamond in B. ## Diamond by that, how we more complex tasks. improvements include more and more • Self-reported

23.7%

Other Tests

Specialized benchmarks

AIME 2025

Accuracy. 0-shot. • Self-reported

11.6%

Codegolf v2.2

pass@1. 0-shot. • Self-reported

16.8%

ECLeKTic

0-shot This one from most options use model — it provide query and answer without any-or examples or prompts about that, how should on him answer. How this one from most complex for model — from her that she/it query and on him answer without additional context. that this most common way use models AI people. that, on use models in mode 0-shot, for many tasks their performance can substantially improve with help additional context or examples • Self-reported

19.0%

Global-MMLU

In 0-shot scenarios model is provided task without any-or examples for demonstration execution tasks. Model should fully on its obtained in process preliminary training, in order to answer. For example, model can obtain query: "Solve equation 3x + 7 = 19", and model should generate answer without any-or additional examples or instructions about that, how solve equations. This useful for verification that, how well well model can and apply knowledge, obtained in time preliminary training, to new tasks • Self-reported

60.3%

Global-MMLU-Lite

Accuracy. 0-shot. • Self-reported

64.5%

HiddenMath

Accuracy. 0-shot. • Self-reported

37.7%

Include

Method 0-shot — this process, at which model gives answer without or additional context or examples. In scenarios 0-shot model should rely exclusively on knowledge, obtained in time preliminary training, in order to its answer. This most strict testing, since she/it evaluates internal knowledge model without which-or additional help. In tests 0-shot model simply receives instruction or question and should immediately indeed give answer. For example, in task by mathematics can : "from x² by dx?" — and model should answer, using only its knowledge, without capabilities examples solutions similar tasks. 0-shot testing often is used how measure performance model, how well well she/it knowledge in time training and can apply their to new tasks without additional context • Self-reported

57.2%

LiveCodeBench

pass@1. 0-shot. • Self-reported

13.2%

LiveCodeBench v5

pass@1. 0-shot. • Self-reported

25.7%

MMLU-Pro

Accuracy. shot • Self-reported

50.6%

MMLU-ProX

0-shot In context large language models (LLM), training (0-shot) relates to to abilities model perform task without preliminary examples or training on specific data this tasks. Model exclusively on its preliminarily trained knowledge and ability to generalization. When training model receives only instruction or question, not access to examples correct answers on tasks. necessary rely on patterns and information, in time preliminary training, in order to answer. Performance model in mode training is considered important score her/its capabilities and abilities. performance in such conditions indicates on then, that model indeed "understands" task, and not simply or examples. Tasks training especially for models, so how they require application knowledge in new contexts without which-or or settings for specific tasks • Self-reported

19.9%

OpenAI MMLU

0-shot AI: provides answer on query without provision any-or examples in This most common way with models in world, so how this most way (that, how people usually with other), and he not requires from user provision examples. 0-shot also useful for evaluation capabilities model, since he shows, that model can do without additional in form examples • Self-reported

35.6%

WMT24++

Character-level F-score. 0-shot. • Self-reported

50.1%

License & Metadata

License

proprietary

Announcement Date

June 26, 2025

Last Updated

July 19, 2025

Similar Models

All Models

Gemma 3n E2B

Google

MM8.0B

Best score:0.5 (ARC)

Released:Jun 2025

Gemma 3 4B

Google

MM4.0B

Best score:0.7 (HumanEval)

Released:Mar 2025

Price:$0.02/1M tokens