Gemma 3n E2B Instructed
MultimodalGemma 3n is a multimodal model designed for local hardware deployment, supporting image, text, audio, and video inputs. It includes a language decoder, audio encoder, and visual encoder and is available in two sizes: E2B and E4B. The model is optimized for efficient memory usage, allowing it to run on devices with limited GPU RAM. Gemma is a family of lightweight, state-of-the-art open models from Google, built on the same research and technology used to create Gemini models. Gemma models are well-suited for various content understanding tasks, including question answering, summarization, and reasoning. Their relatively small size enables deployment in resource-constrained environments such as laptops, desktops, or private cloud infrastructure, democratizing access to state-of-the-art AI models and fostering innovation for everyone. Gemma 3n models are designed for efficient execution on resource-constrained devices. They can process multimodal inputs, working with text, images, video, and audio, and generate text outputs, with open weights for instruction-tuned variants. These models were trained on data in over 140 spoken languages.
Key Specifications
Timeline
Technical Specifications
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Programming
Mathematics
Reasoning
Other Tests
License & Metadata
Similar Models
All ModelsGemma 3n E2B
Gemma 3 4B
Gemma 3n E2B Instructed LiteRT (Preview)
Gemma 3n E4B Instructed
Gemini 1.5 Flash 8B
MedGemma 4B IT
Gemma 3n E4B Instructed LiteRT Preview
Gemma 3n E4B
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.