Gemma 3n E2B Instructed LiteRT (Preview)

Name: Gemma 3n E2B Instructed LiteRT (Preview)
Author: Google

Multimodal

Google

Gemma 3n is a generative AI model optimized for everyday devices such as phones, laptops, and tablets. The model incorporates innovations like Per-Layer Embedding (PLE) parameter caching and the MatFormer model architecture to reduce computational and memory requirements. These models process audio, text, and visual data, though this E4B preview currently supports text and image input. Gemma is a family of lightweight, state-of-the-art open models from Google, built on the same research and technology used to create Gemini models, and licensed for responsible commercial use.

Key Specifications

Parameters

1.9B

Context

Release Date

May 20, 2025

Average Score

43.9%

Repository Model Weights Results Blog

Timeline

Key dates in the model's history

Announcement

May 20, 2025

Last Update

July 19, 2025

Today

May 9, 2026

Technical Specifications

Parameters

1.9B

Training Tokens

Knowledge Cutoff

June 1, 2024

Family

Capabilities

MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

HellaSwag

Accuracy on 10 examples AI: Translate text method "Multi-agent AIME-GPT4" on Russian • Self-reported

72.2%

MMLU

0-shot accuracy • Self-reported

60.1%

Winogrande

5-shot accuracy • Self-reported

66.8%

Programming

Programming skills tests

HumanEval

0-shot pass@1 AI: Accuracy tasks with first attempts without examples Indicator "0-shot pass@1" evaluates ability model correctly solve task with first attempts, without provision examples solutions similar tasks. This measure that, how well well model can apply its internal knowledge to new this important: High score 0-shot pass@1 about that, that model understanding domain field and can effectively use this understanding in new situations without necessity in additional examples or : tasks, which model solves correctly with first attempts, without provision examples solutions similar tasks • Self-reported

66.5%

MBPP

3-shot pass@1 Ability language model solve tasks on basis examples (etc.few-shot learning) — especially since she/it reflects ability model and apply information. We we measure this score with help metrics pass@1, accuracy language model at solving tasks with provision three examples (3-shot). tasks for this metrics includes from GPQA, In order to 3-shot pass@1, we: 1. manner we choose example tasks from GPQA 2. manner we choose three example solutions from that indeed knowledge 3. model solve task 4. is whether first answer correct Examples complexity: • More weak model: 15-25% pass@1 • model: 30-40% pass@1 • Strong model: 50-70% pass@1 for evaluation: model should solve tasks on basis examples, not exclusively on knowledge • Self-reported

56.6%

Mathematics

Mathematical problems and computations

MGSM

0-shot accuracy • Self-reported

53.1%

Reasoning

Logical reasoning and analysis

BIG-Bench Hard

Accuracy at training with number examples This metric evaluates, how well exactly model can perform new task, total several demonstrations or examples. She/It measures ability model to and generalization on basis information. Models with high accuracy at training with number examples can quickly new tasks without necessity or settings. This especially important for scenarios, where data can be or where is required to new queries • Self-reported

44.3%

DROP

1-shot metric F1 by Evaluation answers model on questions often requires on testing with using and various metrics extraction answers. For LLM with most metric evaluation is F1 by with additional for language. We we use F1 by from GPQA for evaluation accuracy model, considering, that one and that indeed answer can be by-and often presented in form text, and not numbers • Self-reported

53.9%

GPQA

Diamond, 0-shot RelaxedAccuracy/accuracy RelaxedAccuracy (or "accuracy with ") - this metric, which tries evaluate answers model more than standard strict accuracy. This especially useful for tasks with answers, where answers can be correct with various or ways solutions. When use RelaxedAccuracy: - answers correct, if they in specific from answer (for example, in ±1%) - different one and that indeed numbers (against different forms and etc.etc.) - Some numerical answers also can correct This metric useful for tasks, where: 1. several ways computation and representations correct answer 2. from-for in methodology solutions not should 3. Important evaluate general understanding and solution, and not simply exact match with reference answer RelaxedAccuracy often is used together with metric accuracy, in order to give more full representation about performance model on mathematical and computational tasks • Self-reported

24.8%

Other Tests

Specialized benchmarks

AIME 2025

Accuracy at example AI: 0-shot Accuracy in Figure 4 measures the performance of models on new, unseen benchmarks without any examples or context from that benchmark (i.e., 'zero' shots). This is useful because it evaluates a model's reasoning capabilities on various tasks without requiring any problem-specific fine-tuning, explanations, or context. It tells us about the fundamental reasoning strength a model possesses out-of-the-box. The benchmarks used in this analysis represent different reasoning skills: AIME, GPQA, FrontierMath, MMLU/STEM, and code generation. Several are multi-step reasoning tasks that require breaking problems down into smaller parts. For this experiment, tasks were provided to the models with minimal context - just the problems themselves without any guidance, examples, or specific instructions beyond asking for a solution • Self-reported

6.7%

ARC-C

25-shot accuracy AI: 25-shot accuracy • Self-reported

51.7%

ARC-E

Accuracy at context AI: I query, maintaining "0-shot" in form. This standard in field machine training, when model solves task without preliminary examples. Correct translation - "Accuracy at context" • Self-reported

75.8%

BoolQ

0-shot accuracy • Self-reported

76.4%

Codegolf v2.2

0-shot pass@1 AI: 0-shot pass@1 (then is benchmark with first attempts without examples) — this method evaluation performance model, at which her/its ability solve tasks with first attempts without provision examples or preliminary training for specific tasks. This score especially at evaluation abilities basic models work with or tasks, since he reflects understanding model and ability to generalization, and not simply or examples • Self-reported

11.0%

ECLeKTic

0-shot ECLeKTic evaluation 0-shot ECLeKTic evaluation is calculated with help tests questions for evaluation general knowledge and abilities solve tasks model, with using 0-shot prompt. All tests are with answer. Each test is evaluated automatically, with using GPT-4 for comparison answer with reference answer and determination its correctness, in some cases with help tools. tests: - OpenBookQA (OBQA): 500 questions with several options answers about for initial school - NaturalQuestions (NQ): 3610 questions from queries Google - TriviaQA (TQA): 11k questions from BoolQ (BQ): 3k /no questions - AI2 Reasoning Challenge (ARC): 7.8k scientific questions for initial and school - CommonsenseQA (CQA): 12k questions with several options answers about for each assignments, in order to obtain evaluation • Self-reported

2.5%

Global-MMLU

0-shot accuracy • Self-reported

55.1%

Global-MMLU-Lite

0-shot accuracy AI: *without instruction or example* • Self-reported

59.0%

HiddenMath

0-shot accuracy • Self-reported

27.7%

Include

0-shot accuracy • Self-reported

38.6%

LiveCodeBench

0-shot pass@1 In context testing large language models, pass@1 relates to to that, whether model solve problem or task with first attempts. This measure abilities model give correct answer or solution without preliminary examples, instructions or iterations. Usually at testing LLM on tasks reasoning, such how mathematical tasks or logical puzzles, model or provides correct solution with first attempts, or no. Pass@1 measures proportion tasks in set, which model successfully solved with first attempts. This strict metric, since she/it not allows several attempts, training on or additional information. She/It evaluates "" ability model to reasoning and solving problems at context. scores pass@1 usually indicate on more abilities model to reasoning, so how they demonstrate, that model can correctly solve tasks without additional help or prompts • Self-reported

13.2%

LiveCodeBench v5

0-shot pass@1 Metric pass@1 or "solution with first attempts" how well model capable give correct solution for assignments with first attempts, without necessity do several attempts. She/It evaluates ability model generate exact solution immediately, and not by means of This metric especially important for understanding model, since she/it demonstrates reliability at execution tasks with first times. For models more quality accuracy first attempts can be critically important score their Evaluation pass@1 usually how proportion assignments, which model solves correctly with first attempts, without intermediate steps or repeated answers. For example, score 0.75 means, that model correctly solves 75% assignments with first attempts • Self-reported

18.6%

MMLU-Pro

Accuracy at example AI: that in end 20-AI Winterpocalypse even is considered allows and all more systems artificial intelligence. that these systems perform complex tasks. : base accuracy model in execution specific tasks without provision any-or examples that, how task should be Methodology: 1. model query, which describes task. 2. at model execution tasks without provision examples execution. 3. answer model by Analysis: Accuracy at example is important score abilities model understand instructions and apply its knowledge without additional examples. This measurement allows evaluate basic capabilities model and her/its ability to generalization knowledge. Example application: at model solution mathematical tasks, translation text or answer on question without provision examples that, how should look correct answer • Self-reported

40.5%

MMLU-ProX

0-shot accuracy • Self-reported

8.1%

Natural Questions

5-shot accuracy • Self-reported

15.5%

PIQA

0-shot Accuracy Accuracy in 0-shot mode — this ability model correctly solve tasks, with which she/it first, without preliminary examples or instructions, to specific task. When evaluation in 0-shot mode model receives only description tasks or question, without demonstration similar solved tasks. Accuracy in 0-shot mode is important score general capabilities model, so how she/it measures, how well well model can its knowledge and apply their to new This "" model — ability reason about new tasks on basis already knowledge. For testing 0-shot accuracy we usually we present model diverse tasks from various fields knowledge without any-or additional prompts about that, how their solve, and then we measure proportion correct answers • Self-reported

78.9%

Social IQa

0-accuracy • Self-reported

48.8%

TriviaQA

5-shot accuracy • Self-reported

60.8%

WMT24++

ChrF, 0-shot Character-level F-score ChrF — this metric, n-characters (usually to 6-) between data model and reference answer. This metric, at evaluation quality generation text, especially for evaluation translation. Metric works especially well for languages with where even in can substantially We we compute ChrF-score between data model and reference answer without or training or settings, therefore we this 0-shot ChrF. In our by evaluation tasks with correct answer we discovered, that ChrF well with evaluation • Self-reported

42.7%

License & Metadata

License

gemma

Announcement Date

May 20, 2025

Last Updated

July 19, 2025

Similar Models

All Models

Gemma 3n E2B

Google

MM8.0B

Best score:0.5 (ARC)

Released:Jun 2025

Gemma 3 4B

Google

MM4.0B

Best score:0.7 (HumanEval)

Released:Mar 2025

Price:$0.02/1M tokens

Gemma 3n E4B Instructed LiteRT Preview

Google

MM1.9B

Best score:0.8 (HumanEval)

Released:May 2025