Gemini Diffusion

Name: Gemini Diffusion
Author: Google

Google

Gemini Diffusion is an experimental text diffusion model from Google DeepMind, representing a cutting-edge solution in its field. It explores a new type of language model designed to give users more control, creativity, and text generation speed. Instead of predicting text token by token, it learns to generate results by gradually refining noise, enabling rapid iteration and error correction during generation. Key capabilities include fast response times (claimed 1,479 tokens/sec excluding overhead), more coherent text generation by outputting entire token blocks simultaneously, and iterative refinement for consistent results. It excels at editing tasks, including math and coding contexts.

Key Specifications

Parameters

Context

Release Date

May 20, 2025

Average Score

46.9%

Repository Results Blog

Timeline

Key dates in the model's history

Announcement

May 20, 2025

Last Update

July 19, 2025

Today

May 9, 2026

Technical Specifications

Parameters

Training Tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests

HumanEval

@1 AI: ChatGPT-4 Query: log(1 + e^x) dx Answer: For computation $\int \log(1 + e^x) dx$, I $u = e^x$, $du = e^x dx$ and $dx = \frac{du}{u}$. : $\int \log(1 + e^x) dx = \int \log(1 + u) \frac{du}{u}$ This not very other approach. that $\log(1 + e^x) = \log\left(\frac{e^x(e^{-x} + 1)}{e^{-x}}\right) = \log\left(\frac{1 + e^x}{e^{-x}}\right) = \log(1 + e^x) + \log(e^x) = \log(1 + e^x) + x$ I error. by : $\int \log(1 + e^x) dx$ $u = \log(1 + e^x)$ and $dv = dx$. $du = \frac{e^x}{1+e^x} dx$ and $v = x$. $\int u dv = uv - \int v du$ $\int \log(1 + e^x) dx = x\log(1 + e^x) - \int x \frac{e^x}{1+e^x} dx$ still one approach. $u = 1 + e^x$, $du = e^x dx$ or $dx = \frac{du}{u-1}$. $\int \log(1 + e^x) dx = \int \log(u) \frac{du}{u-1}$ This still by $u = \log(1 + e^x)$ and $dv = dx$. $du = \frac{e^x}{1+e^x} dx$ and $v = x$. $\int \log(1 + e^x) dx = x\log(1 + e^x) - \int \frac{xe^x}{1+e^x} dx$ that $\frac{xe^x}{1+e^x} = x - \frac{x}{1+e^x}$. $\int \frac{xe^x}{1+e^x} dx = \int x dx - \int \frac{x}{1+e^x} dx = \frac{x^2}{2} - \int \frac{x}{1+e^x} dx • Self-reported

89.6%

MBPP

In method (pass @1) we task from set with help specific model. If task and solution we we evaluate task for one pass through model. When we we use methods (thoughts, Chain-of-Thought, tool-augmented Chain-of-Thought), we prompt, which model use this method, and then we verify answer • Self-reported

76.0%

SWE-Bench Verified

@1, evaluation (only ), prompt 32K • Self-reported

22.9%

Reasoning

Logical reasoning and analysis

GPQA

pass @1 • Self-reported

40.4%

Other Tests

Specialized benchmarks

AIME 2025

Pass @1 AI: chatgpt-4 AI system: ChatGPT-4 (aka gpt-4-turbo). AI behavior: The AI is provided directly with task descriptions and resources. Description: ChatGPT refers to the GPT systems built by OpenAI that power the ChatGPT website and the GPT-4-Turbo API. In the Chatbot Arena, this is the GPT system that was deployed at the time the match was run. For multiple-round conversations, the AI retains some memory of the earlier interaction. Deployment: The AI is accessed via OpenAI's ChatGPT website (or the GPT-4-Turbo API). The AI system gets the user's message directly, and can respond in a variety of forms (text, photos, drawings, etc.) Pros: - Latest version of GPT model - Direct access to model - Minimal latency Cons: - Limited context window (128k tokens) - No real-time web access - Cannot solve lengthy or complex problems that require more than the context window • Self-reported

23.3%

BIG-Bench Extra Hard

pass @1 • Self-reported

15.0%

BigCodeBench

@1 • Self-reported

45.4%

Global-MMLU-Lite

with first attempts When solving tasks we measure, solves whether her/its model with first attempts. In real situations, users, with LLM, obtain correct answer on its tasks with first attempts, without additional Therefore ability model give correct answer with first attempts is metric. We we measure, capable whether model give correct answer at tasks, without additional prompts or questions • Self-reported

69.1%

LBPP (v2)

pass @1 • Self-reported

56.8%

LiveCodeBench

@1 In this we we determine correctness solutions with points view its answer. Solution is considered correct, if correct final answer, at this solution can other errors. System receives score 1, if generates correct answer, and 0 in case. This score has value for applications and evaluation "step", when user only correctness final answer, and not correctness each step • Self-reported

30.9%

License & Metadata

License

proprietary

Announcement Date

May 20, 2025

Last Updated

July 19, 2025

Similar Models

All Models

Gemini 1.0 Pro

Google

Best score:0.7 (MMLU)

Released:Feb 2024

Price:$0.50/1M tokens

Gemma 2 9B

Google

9.2B

Best score:0.7 (MMLU)

Released:Jun 2024

Gemma 2 27B

Google

27.2B

Best score:0.8 (MMLU)

Released:Jun 2024

Gemini 1.5 Flash

Google

Best score:0.8 (MMLU)

Released:May 2024

Price:$0.15/1M tokens

Gemini 2.0 Flash

Google

Best score:0.6 (GPQA)

Released:Dec 2024

Price:$0.10/1M tokens

Mercury 2

Inception

Best score:0.7 (GPQA)

Released:Feb 2026

Grok-1.5

xAI

Best score:0.8 (MMLU)

Released:Mar 2024

Qwen3 Max

Alibaba

Best score:0.6 (GPQA)

Released:Dec 2025

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.