Google logo

Gemma 3 27B

Multimodal
Google

Gemma 3 27B is a multimodal language model from Google with 27 billion parameters that processes text and image input and generates text output. The model has a 128K context window, multi-language support, and open weights. Suitable for complex question answering, summarization, logical reasoning, and image understanding tasks.

Key Specifications

Parameters
27.0B
Context
131.1K
Release Date
March 12, 2025
Average Score
65.4%

Timeline

Key dates in the model's history
Announcement
March 12, 2025
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
27.0B
Training Tokens
14.0T tokens
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$0.11
Output (per 1M tokens)
$0.20
Max Input Tokens
131.1K
Max Output Tokens
131.1K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests
HumanEval
0-shot evaluation AI: Artificial Intelligence, ChatGPTSelf-reported
87.8%
MBPP
3-shot evaluation AI: When evaluation each question we used approach with three LLM one and that indeed query three times and give different answers in each attempt. After this best from three answers. Such approach gives more evaluation capabilities LLM. model can randomly not with task or, randomly solve her/its. model three attempts, we we can better understand her/its abilities. Human: For each question model three attempts. These answers which best from three. Then evaluation to answer. We we consider, that this gives representation about that, on that indeed capable model. This its "without ", with in where model can give several answersSelf-reported
74.4%

Mathematics

Mathematical problems and computations
GSM8k
0-shot evaluation AI When we about abilities AI results, from that AI should use those indeed methods, which use people. from approaches to verification this — AI on tasks, for which solutions, and evaluate, how well well work these (if work) on AI. If AI demonstrates behavior, with this (but not ) that work AI For models machine training we that model not so, how people. In some cases we we can obtain understanding that, how model solves problem, and these methods substantially from In other cases model "". In case, we we can execute simple tests, in order to show, that model uses other methods: model with on examples, which for but for human, and in examples, which for human, but for In difference from majority models machine training, language model can solve tasks ways, including methods, (although, possible, not )Self-reported
95.9%
MATH
0-shot In this mode we we measure performance model in its without which-or additional settings. We we use standard test examples from set data GPQA and we evaluate answers model by scale from 0 to 5, where 0 means fully incorrect answer, and 5 means fully correct. In order to 0-shot : 1. question model without additional instructions or examples 2. answer model in her/its 3. accuracy answer by scale from 0 to 5 4. for all questions in test set 5. evaluation for determination base performance model This for comparison with other methods and helps us understand abilities model without or additionalSelf-reported
89.0%

Reasoning

Logical reasoning and analysis
BIG-Bench Hard
0-shot evaluation AI: Method evaluation, in which efficiency model at any-or examples or demonstrations for specific tasks. In difference from few-shot or fine-tuning approaches, 0-shot testing not provides model no/none examples that, how solve task, and verifies her/its ability rely exclusively on preliminarily knowledgeSelf-reported
87.6%
GPQA
0-shot evaluation diamond zero-shot evaluation "diamond" represents itself simple template, for evaluation abilities model answer on complex questions without preliminary training or examples. When approach we we provide model question and we ask her/its: - problem - which knowledge or methods for solutions - solve task step for step - its answer This approach especially useful for evaluation basic knowledge and abilities model to reasoning without additional Example query: "Please, following task. First problem, necessary knowledge or methods, then sequentially task, and finally its answer."Self-reported
42.4%

Multimodal

Working with images and visual data
AI2D
multimodal evaluationSelf-reported
84.5%
ChartQA
multimodal evaluationSelf-reported
78.0%
DocVQA
Multimodal evaluation AI: I'll translate the text about multimodal evaluation.Self-reported
86.6%

Other Tests

Specialized benchmarks
BIG-Bench Extra Hard
0-shot evaluation AI: 0-shot evaluationSelf-reported
19.3%
Bird-SQL (dev)
# Evaluation **** When we new model, how we whether she/it better, than ? developed set benchmarks for evaluation abilities models, from tests on to questions level and puzzles, requiring coding. In this we how these methods evaluation by models, and also advantages and limitations various approaches. **evaluation** LLM in mainly by — that, how well well model following in text. However when model sufficiently in order to use their for solutions tasks, such how or solution mathematical tasks, to more direct evaluation. This to complex tasks, such how MMLU (Massive Multitask Language Understanding) — questions with several options answers by 57 subjects, from to Models also on tests, such how and tests LSAT forSelf-reported
54.4%
ECLeKTic
0-shot evaluation AI models: Anthropic Claude 3 Opus We presented each problem to the model without any examples or demonstrations. Models are also not told what solution approaches to use. The main test is to see how well AI models can solve complex problems from scratch. We create each problem by prompting Claude 3 Opus with the text of the problem. We also ask Claude to directly solve the problem without explanation. We extract and use the model's final answer. Metrics: We report whether the model's final numerical answer is correct. In some cases, we allow minor rounding errors or equivalent forms of the correct solution. AI: Models AI: Anthropic Claude 3 Opus We each task model without any-or examples or demonstrations. also not which approaches to solving use. — verify, how well well model AI can solve complex tasks with We each task, Claude 3 Opus text tasks. We also we ask Claude solve task directly without explanations. We and we use answer model. : We whether answer model. In some cases we errors or forms correct solutionsSelf-reported
16.7%
FACTS Grounding
# Methodology evaluation ## Key conclusions - PRM size, perform complex tasks reasoning through search - PRM especially well handles with for and and also demonstrates at solving tasks - PRM demonstrates quality solutions tasks, including set approaches, before than find correct - For extraction performance from PRM we we use mode thinking, set solutions and ## Description evaluation PRM-1.5 on various tasks, requiring complex reasoning, including tasks for puzzles, mathematical and general tests on reasoning. In order to understand full PRM, we her/its in mode thinking, when model generates several independent solutions, before than choose most ### We PRM-1.5 on set for which require deep understanding and knowledge. These from to and ### for Despite on then, that these puzzles for they often represent complexity for models AI from-for solutions or necessity PRM-1.5 strong results, more model. ### PRM-1.5 results on mathematical competitions, although we that is capabilities for improvements in this field. tasks from AMC, AIME and other mathematical competitions. ### More assignments on reasoning We testing on more general tasks reasoning, such how GPQA and MATH, which represent itself real examples complex problems, requiring multi-step solutions. ### Method mode thinking In order to from capabilities PRM, we methodSelf-reported
74.9%
Global-MMLU-Lite
0-shot evaluation AISelf-reported
75.1%
HiddenMath
0-shot evaluation AI-MT question or task with one language on other (for example, with on ), then LLM gives answer on language, and then this answer on language with help AI-MT. with answer, which provides LLM at direct work with This method allows evaluate and work model at use different languages without necessity to languageSelf-reported
60.3%
IFEval
0-shot evaluation AI : (2 / 5)Self-reported
90.4%
InfoVQA
multimodal evaluationSelf-reported
70.6%
LiveCodeBench
0-shot evaluation AI: ChatGPT (GPT-4)Self-reported
29.7%
MathVista-Mini
multimodal evaluationSelf-reported
67.6%
MMLU-Pro
0-shot evaluation Evaluation 0-shot represents itself model evaluation, at which we question model, not providing it which-or examples solutions. model should generate answers only on basis its knowledge, without additional context, instructions or examples. This method especially for understanding basic capabilities model and her/its abilities apply preliminarily knowledge to new tasks. For 0-shot evaluation we model question and we evaluate her/its answer directly. In cases we we ask model explain its course thoughts in process solutions, since this allows understand, how she/it and helps find errors. However for some tasks, where explanation can be too or not is required, we we can evaluate only answerSelf-reported
67.5%
MMMU (val)
multimodal evaluationSelf-reported
64.9%
Natural2Code
0-shot evaluation AI MeshKB: training large language models (LLM) with help generation and knowledge and knowledge from long texts for (for example, Obsidian and LogSeq), MeshKB helps AI and information from long He uses special thinking and between knowledge, which improves accuracy and answers. How this works: 1. and : LLM text, information. 2. in structure knowledge: concepts, information. 3. answers: Use this for formation answers, which information. This approach allows models better important details, connection between various text and more exact conclusions. MeshKB for improvements efficiency LLM in such tasks, how answers on questions and generationSelf-reported
84.5%
SimpleQA
0-shot evaluation Evaluation model without provision it examples. Model receives assignment, but not receives examples assignments that indeed type. Evaluation 0-shot for since not requires creation examples, but she/it usually gives more scores performance model. that evaluation 0-shot better reflects abilities model, since model not receives additional information. However this statement that that tasks on provide context or examples, and understanding tasks with this context is important performanceSelf-reported
10.0%
TextVQA
multimodal evaluationSelf-reported
65.1%
VQAv2 (val)
Multimodal evaluation AI: I'll translate this short technical term for AI model evaluation.Self-reported
71.0%
WMT24++
0-shot evaluation AI: 0-shot already is used for understanding capabilities but majority in this field on capabilities computations. We this evaluation for all 20 tasks. Human: 0-shot is used so indeed, how and in about evaluation LLM. receives task without or example and then should execute her/its, on own internal knowledge and reasoning. In that this shows basic capabilities model, necessary for solutions tasks. For each tasks we 0-shot prompt, which includes information about task and by We be in prompts, using for each model text and queries. However, we use instructions by answers, in order to probability answers in differentSelf-reported
53.4%

License & Metadata

License
gemma
Announcement Date
March 12, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.