Gemma 3 27B

Name: Gemma 3 27B
Author: Google

Multimodal

Google

Gemma 3 27B is a multimodal language model from Google with 27 billion parameters that processes text and image input and generates text output. The model has a 128K context window, multi-language support, and open weights. Suitable for complex question answering, summarization, logical reasoning, and image understanding tasks.

Key Specifications

Parameters

27.0B

Context

131.1K

Release Date

March 12, 2025

Average Score

65.4%

Research Paper Model Weights Results Blog

Timeline

Key dates in the model's history

Announcement

March 12, 2025

Last Update

July 19, 2025

Today

May 10, 2026

Technical Specifications

Parameters

27.0B

Training Tokens

14.0T tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$0.11

Output (per 1M tokens)

$0.20

Max Input Tokens

131.1K

Max Output Tokens

131.1K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests

HumanEval

0-shot evaluation AI: Artificial Intelligence, ChatGPT • Self-reported

87.8%

MBPP

3-shot evaluation AI: When evaluation each question we used approach with three LLM one and that indeed query three times and give different answers in each attempt. After this best from three answers. Such approach gives more evaluation capabilities LLM. model can randomly not with task or, randomly solve her/its. model three attempts, we we can better understand her/its abilities. Human: For each question model three attempts. These answers which best from three. Then evaluation to answer. We we consider, that this gives representation about that, on that indeed capable model. This its "without ", with in where model can give several answers • Self-reported

74.4%

Mathematics

Mathematical problems and computations

GSM8k

0-shot evaluation AI When we about abilities AI results, from that AI should use those indeed methods, which use people. from approaches to verification this — AI on tasks, for which solutions, and evaluate, how well well work these (if work) on AI. If AI demonstrates behavior, with this (but not ) that work AI For models machine training we that model not so, how people. In some cases we we can obtain understanding that, how model solves problem, and these methods substantially from In other cases model "". In case, we we can execute simple tests, in order to show, that model uses other methods: model with on examples, which for but for human, and in examples, which for human, but for In difference from majority models machine training, language model can solve tasks ways, including methods, (although, possible, not ) • Self-reported

95.9%

MATH

0-shot In this mode we we measure performance model in its without which-or additional settings. We we use standard test examples from set data GPQA and we evaluate answers model by scale from 0 to 5, where 0 means fully incorrect answer, and 5 means fully correct. In order to 0-shot : 1. question model without additional instructions or examples 2. answer model in her/its 3. accuracy answer by scale from 0 to 5 4. for all questions in test set 5. evaluation for determination base performance model This for comparison with other methods and helps us understand abilities model without or additional • Self-reported

89.0%

Reasoning

Logical reasoning and analysis

BIG-Bench Hard

0-shot evaluation AI: Method evaluation, in which efficiency model at any-or examples or demonstrations for specific tasks. In difference from few-shot or fine-tuning approaches, 0-shot testing not provides model no/none examples that, how solve task, and verifies her/its ability rely exclusively on preliminarily knowledge • Self-reported

87.6%

GPQA

0-shot evaluation diamond zero-shot evaluation "diamond" represents itself simple template, for evaluation abilities model answer on complex questions without preliminary training or examples. When approach we we provide model question and we ask her/its: - problem - which knowledge or methods for solutions - solve task step for step - its answer This approach especially useful for evaluation basic knowledge and abilities model to reasoning without additional Example query: "Please, following task. First problem, necessary knowledge or methods, then sequentially task, and finally its answer." • Self-reported

42.4%

Multimodal

Working with images and visual data

AI2D

multimodal evaluation • Self-reported

84.5%

ChartQA

multimodal evaluation • Self-reported

78.0%

DocVQA

Multimodal evaluation AI: I'll translate the text about multimodal evaluation. • Self-reported

86.6%

Other Tests

Specialized benchmarks

BIG-Bench Extra Hard

0-shot evaluation AI: 0-shot evaluation • Self-reported

19.3%

Bird-SQL (dev)

# Evaluation **** When we new model, how we whether she/it better, than ? developed set benchmarks for evaluation abilities models, from tests on to questions level and puzzles, requiring coding. In this we how these methods evaluation by models, and also advantages and limitations various approaches. **evaluation** LLM in mainly by — that, how well well model following in text. However when model sufficiently in order to use their for solutions tasks, such how or solution mathematical tasks, to more direct evaluation. This to complex tasks, such how MMLU (Massive Multitask Language Understanding) — questions with several options answers by 57 subjects, from to Models also on tests, such how and tests LSAT for • Self-reported

54.4%

ECLeKTic

0-shot evaluation AI models: Anthropic Claude 3 Opus We presented each problem to the model without any examples or demonstrations. Models are also not told what solution approaches to use. The main test is to see how well AI models can solve complex problems from scratch. We create each problem by prompting Claude 3 Opus with the text of the problem. We also ask Claude to directly solve the problem without explanation. We extract and use the model's final answer. Metrics: We report whether the model's final numerical answer is correct. In some cases, we allow minor rounding errors or equivalent forms of the correct solution. AI: Models AI: Anthropic Claude 3 Opus We each task model without any-or examples or demonstrations. also not which approaches to solving use. — verify, how well well model AI can solve complex tasks with We each task, Claude 3 Opus text tasks. We also we ask Claude solve task directly without explanations. We and we use answer model. : We whether answer model. In some cases we errors or forms correct solutions • Self-reported

16.7%

FACTS Grounding

# Methodology evaluation ## Key conclusions - PRM size, perform complex tasks reasoning through search - PRM especially well handles with for and and also demonstrates at solving tasks - PRM demonstrates quality solutions tasks, including set approaches, before than find correct - For extraction performance from PRM we we use mode thinking, set solutions and ## Description evaluation PRM-1.5 on various tasks, requiring complex reasoning, including tasks for puzzles, mathematical and general tests on reasoning. In order to understand full PRM, we her/its in mode thinking, when model generates several independent solutions, before than choose most ### We PRM-1.5 on set for which require deep understanding and knowledge. These from to and ### for Despite on then, that these puzzles for they often represent complexity for models AI from-for solutions or necessity PRM-1.5 strong results, more model. ### PRM-1.5 results on mathematical competitions, although we that is capabilities for improvements in this field. tasks from AMC, AIME and other mathematical competitions. ### More assignments on reasoning We testing on more general tasks reasoning, such how GPQA and MATH, which represent itself real examples complex problems, requiring multi-step solutions. ### Method mode thinking In order to from capabilities PRM, we method • Self-reported

74.9%

Global-MMLU-Lite

0-shot evaluation AI • Self-reported

75.1%

HiddenMath

0-shot evaluation AI-MT question or task with one language on other (for example, with on ), then LLM gives answer on language, and then this answer on language with help AI-MT. with answer, which provides LLM at direct work with This method allows evaluate and work model at use different languages without necessity to language • Self-reported

60.3%

IFEval

0-shot evaluation AI : (2 / 5) • Self-reported

90.4%

InfoVQA

multimodal evaluation • Self-reported

70.6%

LiveCodeBench

0-shot evaluation AI: ChatGPT (GPT-4) • Self-reported

29.7%

MathVista-Mini

multimodal evaluation • Self-reported

67.6%

MMLU-Pro

0-shot evaluation Evaluation 0-shot represents itself model evaluation, at which we question model, not providing it which-or examples solutions. model should generate answers only on basis its knowledge, without additional context, instructions or examples. This method especially for understanding basic capabilities model and her/its abilities apply preliminarily knowledge to new tasks. For 0-shot evaluation we model question and we evaluate her/its answer directly. In cases we we ask model explain its course thoughts in process solutions, since this allows understand, how she/it and helps find errors. However for some tasks, where explanation can be too or not is required, we we can evaluate only answer • Self-reported

67.5%

MMMU (val)

multimodal evaluation • Self-reported

64.9%

Natural2Code

0-shot evaluation AI MeshKB: training large language models (LLM) with help generation and knowledge and knowledge from long texts for (for example, Obsidian and LogSeq), MeshKB helps AI and information from long He uses special thinking and between knowledge, which improves accuracy and answers. How this works: 1. and : LLM text, information. 2. in structure knowledge: concepts, information. 3. answers: Use this for formation answers, which information. This approach allows models better important details, connection between various text and more exact conclusions. MeshKB for improvements efficiency LLM in such tasks, how answers on questions and generation • Self-reported

84.5%

SimpleQA

0-shot evaluation Evaluation model without provision it examples. Model receives assignment, but not receives examples assignments that indeed type. Evaluation 0-shot for since not requires creation examples, but she/it usually gives more scores performance model. that evaluation 0-shot better reflects abilities model, since model not receives additional information. However this statement that that tasks on provide context or examples, and understanding tasks with this context is important performance • Self-reported

10.0%

TextVQA

multimodal evaluation • Self-reported

65.1%

VQAv2 (val)

Multimodal evaluation AI: I'll translate this short technical term for AI model evaluation. • Self-reported

71.0%

WMT24++

0-shot evaluation AI: 0-shot already is used for understanding capabilities but majority in this field on capabilities computations. We this evaluation for all 20 tasks. Human: 0-shot is used so indeed, how and in about evaluation LLM. receives task without or example and then should execute her/its, on own internal knowledge and reasoning. In that this shows basic capabilities model, necessary for solutions tasks. For each tasks we 0-shot prompt, which includes information about task and by We be in prompts, using for each model text and queries. However, we use instructions by answers, in order to probability answers in different • Self-reported

53.4%

License & Metadata

License

gemma

Announcement Date

March 12, 2025

Last Updated

July 19, 2025

Articles about Gemma 3 27B

Google Drops Gemma 4 Under Apache 2.0 — And It Runs on a Raspberry Pi

Google DeepMind releases Gemma 4, a family of four open models with agentic skills, 256K context, and multimodal capabilities — all under Apache 2.0 license.

April 3, 2026

8 min

Where Is Gemma 4? The Community Is Getting Impatient

Google hasn't said a word about Gemma 4, and the open-source AI community is growing restless. Prediction markets are open, Reddit is debating, and competitors aren't waiting.

March 30, 2026

3 min

Google's TurboQuant Compresses AI Models to 2.5 Bits Without Breaking Them

A new quantization method from Google Research achieves 4.9x KV cache compression with zero accuracy loss. No training required — it works on any model instantly.

March 26, 2026

7 min

Similar Models

All Models

Gemma 3 12B

Google

MM12.0B

Best score:0.9 (HumanEval)

Released:Mar 2025

Price:$0.05/1M tokens

Gemma 3n E4B

Google

MM8.0B

Best score:0.6 (ARC)

Released:Jun 2025

Gemini 1.5 Flash

Google

Best score:0.8 (MMLU)

Released:May 2024

Price:$0.15/1M tokens

Gemini 2.0 Flash

Google

Best score:0.6 (GPQA)

Released:Dec 2024

Price:$0.10/1M tokens

Gemma 2 27B

Google

27.2B

Best score:0.8 (MMLU)

Released:Jun 2024

GPT OSS 20B

OpenAI

MM20.0B

Best score:0.9 (MMLU)

Released:Aug 2025

Price:$0.10/1M tokens

Llama 3.2 90B Instruct

Pixtral-12B

Mistral AI

MM12.4B

Best score:0.7 (HumanEval)

Released:Sep 2024

Price:$0.15/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.