Google logo

Gemma 3n E2B Instructed LiteRT (Preview)

Multimodal
Google

Gemma 3n is a generative AI model optimized for everyday devices such as phones, laptops, and tablets. The model incorporates innovations like Per-Layer Embedding (PLE) parameter caching and the MatFormer model architecture to reduce computational and memory requirements. These models process audio, text, and visual data, though this E4B preview currently supports text and image input. Gemma is a family of lightweight, state-of-the-art open models from Google, built on the same research and technology used to create Gemini models, and licensed for responsible commercial use.

Key Specifications

Parameters
1.9B
Context
-
Release Date
May 20, 2025
Average Score
43.9%

Timeline

Key dates in the model's history
Announcement
May 20, 2025
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
1.9B
Training Tokens
-
Knowledge Cutoff
June 1, 2024
Family
-
Capabilities
MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
HellaSwag
Accuracy on 10 examples AI: Translate text method "Multi-agent AIME-GPT4" on RussianSelf-reported
72.2%
MMLU
0-shot accuracySelf-reported
60.1%
Winogrande
5-shot accuracySelf-reported
66.8%

Programming

Programming skills tests
HumanEval
0-shot pass@1 AI: Accuracy tasks with first attempts without examples Indicator "0-shot pass@1" evaluates ability model correctly solve task with first attempts, without provision examples solutions similar tasks. This measure that, how well well model can apply its internal knowledge to new this important: High score 0-shot pass@1 about that, that model understanding domain field and can effectively use this understanding in new situations without necessity in additional examples or : tasks, which model solves correctly with first attempts, without provision examples solutions similar tasksSelf-reported
66.5%
MBPP
3-shot pass@1 Ability language model solve tasks on basis examples (etc.few-shot learning) — especially since she/it reflects ability model and apply information. We we measure this score with help metrics pass@1, accuracy language model at solving tasks with provision three examples (3-shot). tasks for this metrics includes from GPQA, In order to 3-shot pass@1, we: 1. manner we choose example tasks from GPQA 2. manner we choose three example solutions from that indeed knowledge 3. model solve task 4. is whether first answer correct Examples complexity: • More weak model: 15-25% pass@1 • model: 30-40% pass@1 • Strong model: 50-70% pass@1 for evaluation: model should solve tasks on basis examples, not exclusively on knowledgeSelf-reported
56.6%

Mathematics

Mathematical problems and computations
MGSM
0-shot accuracySelf-reported
53.1%

Reasoning

Logical reasoning and analysis
BIG-Bench Hard
Accuracy at training with number examples This metric evaluates, how well exactly model can perform new task, total several demonstrations or examples. She/It measures ability model to and generalization on basis information. Models with high accuracy at training with number examples can quickly new tasks without necessity or settings. This especially important for scenarios, where data can be or where is required to new queriesSelf-reported
44.3%
DROP
1-shot metric F1 by Evaluation answers model on questions often requires on testing with using and various metrics extraction answers. For LLM with most metric evaluation is F1 by with additional for language. We we use F1 by from GPQA for evaluation accuracy model, considering, that one and that indeed answer can be by-and often presented in form text, and not numbersSelf-reported
53.9%
GPQA
Diamond, 0-shot RelaxedAccuracy/accuracy RelaxedAccuracy (or "accuracy with ") - this metric, which tries evaluate answers model more than standard strict accuracy. This especially useful for tasks with answers, where answers can be correct with various or ways solutions. When use RelaxedAccuracy: - answers correct, if they in specific from answer (for example, in ±1%) - different one and that indeed numbers (against different forms and etc.etc.) - Some numerical answers also can correct This metric useful for tasks, where: 1. several ways computation and representations correct answer 2. from-for in methodology solutions not should 3. Important evaluate general understanding and solution, and not simply exact match with reference answer RelaxedAccuracy often is used together with metric accuracy, in order to give more full representation about performance model on mathematical and computational tasksSelf-reported
24.8%

Other Tests

Specialized benchmarks
AIME 2025
Accuracy at example AI: 0-shot Accuracy in Figure 4 measures the performance of models on new, unseen benchmarks without any examples or context from that benchmark (i.e., 'zero' shots). This is useful because it evaluates a model's reasoning capabilities on various tasks without requiring any problem-specific fine-tuning, explanations, or context. It tells us about the fundamental reasoning strength a model possesses out-of-the-box. The benchmarks used in this analysis represent different reasoning skills: AIME, GPQA, FrontierMath, MMLU/STEM, and code generation. Several are multi-step reasoning tasks that require breaking problems down into smaller parts. For this experiment, tasks were provided to the models with minimal context - just the problems themselves without any guidance, examples, or specific instructions beyond asking for a solutionSelf-reported
6.7%
ARC-C
25-shot accuracy AI: 25-shot accuracySelf-reported
51.7%
ARC-E
Accuracy at context AI: I query, maintaining "0-shot" in form. This standard in field machine training, when model solves task without preliminary examples. Correct translation - "Accuracy at context"Self-reported
75.8%
BoolQ
0-shot accuracySelf-reported
76.4%
Codegolf v2.2
0-shot pass@1 AI: 0-shot pass@1 (then is benchmark with first attempts without examples) — this method evaluation performance model, at which her/its ability solve tasks with first attempts without provision examples or preliminary training for specific tasks. This score especially at evaluation abilities basic models work with or tasks, since he reflects understanding model and ability to generalization, and not simply or examplesSelf-reported
11.0%
ECLeKTic
0-shot ECLeKTic evaluation 0-shot ECLeKTic evaluation is calculated with help tests questions for evaluation general knowledge and abilities solve tasks model, with using 0-shot prompt. All tests are with answer. Each test is evaluated automatically, with using GPT-4 for comparison answer with reference answer and determination its correctness, in some cases with help tools. tests: - OpenBookQA (OBQA): 500 questions with several options answers about for initial school - NaturalQuestions (NQ): 3610 questions from queries Google - TriviaQA (TQA): 11k questions from BoolQ (BQ): 3k /no questions - AI2 Reasoning Challenge (ARC): 7.8k scientific questions for initial and school - CommonsenseQA (CQA): 12k questions with several options answers about for each assignments, in order to obtain evaluationSelf-reported
2.5%
Global-MMLU
0-shot accuracySelf-reported
55.1%
Global-MMLU-Lite
0-shot accuracy AI: *without instruction or example*Self-reported
59.0%
HiddenMath
0-shot accuracySelf-reported
27.7%
Include
0-shot accuracySelf-reported
38.6%
LiveCodeBench
0-shot pass@1 In context testing large language models, pass@1 relates to to that, whether model solve problem or task with first attempts. This measure abilities model give correct answer or solution without preliminary examples, instructions or iterations. Usually at testing LLM on tasks reasoning, such how mathematical tasks or logical puzzles, model or provides correct solution with first attempts, or no. Pass@1 measures proportion tasks in set, which model successfully solved with first attempts. This strict metric, since she/it not allows several attempts, training on or additional information. She/It evaluates "" ability model to reasoning and solving problems at context. scores pass@1 usually indicate on more abilities model to reasoning, so how they demonstrate, that model can correctly solve tasks without additional help or promptsSelf-reported
13.2%
LiveCodeBench v5
0-shot pass@1 Metric pass@1 or "solution with first attempts" how well model capable give correct solution for assignments with first attempts, without necessity do several attempts. She/It evaluates ability model generate exact solution immediately, and not by means of This metric especially important for understanding model, since she/it demonstrates reliability at execution tasks with first times. For models more quality accuracy first attempts can be critically important score their Evaluation pass@1 usually how proportion assignments, which model solves correctly with first attempts, without intermediate steps or repeated answers. For example, score 0.75 means, that model correctly solves 75% assignments with first attemptsSelf-reported
18.6%
MMLU-Pro
Accuracy at example AI: that in end 20-AI Winterpocalypse even is considered allows and all more systems artificial intelligence. that these systems perform complex tasks. : base accuracy model in execution specific tasks without provision any-or examples that, how task should be Methodology: 1. model query, which describes task. 2. at model execution tasks without provision examples execution. 3. answer model by Analysis: Accuracy at example is important score abilities model understand instructions and apply its knowledge without additional examples. This measurement allows evaluate basic capabilities model and her/its ability to generalization knowledge. Example application: at model solution mathematical tasks, translation text or answer on question without provision examples that, how should look correct answerSelf-reported
40.5%
MMLU-ProX
0-shot accuracySelf-reported
8.1%
Natural Questions
5-shot accuracySelf-reported
15.5%
PIQA
0-shot Accuracy Accuracy in 0-shot mode — this ability model correctly solve tasks, with which she/it first, without preliminary examples or instructions, to specific task. When evaluation in 0-shot mode model receives only description tasks or question, without demonstration similar solved tasks. Accuracy in 0-shot mode is important score general capabilities model, so how she/it measures, how well well model can its knowledge and apply their to new This "" model — ability reason about new tasks on basis already knowledge. For testing 0-shot accuracy we usually we present model diverse tasks from various fields knowledge without any-or additional prompts about that, how their solve, and then we measure proportion correct answersSelf-reported
78.9%
Social IQa
0-accuracySelf-reported
48.8%
TriviaQA
5-shot accuracySelf-reported
60.8%
WMT24++
ChrF, 0-shot Character-level F-score ChrF — this metric, n-characters (usually to 6-) between data model and reference answer. This metric, at evaluation quality generation text, especially for evaluation translation. Metric works especially well for languages with where even in can substantially We we compute ChrF-score between data model and reference answer without or training or settings, therefore we this 0-shot ChrF. In our by evaluation tasks with correct answer we discovered, that ChrF well with evaluationSelf-reported
42.7%

License & Metadata

License
gemma
Announcement Date
May 20, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.