Gemini 2.0 Flash Thinking

Name: Gemini 2.0 Flash Thinking
Author: Google

Multimodal

Google

Gemini 2.0 Flash Thinking is an enhanced reasoning model capable of showing its thought processes for improved performance and explainability. Combining speed and performance, Gemini 2.0 Flash Thinking also excels at science and math tasks, showing its reasoning when solving complex problems.

Key Specifications

Parameters

Context

Release Date

January 21, 2025

Average Score

74.3%

API Documentation Results Blog

Timeline

Key dates in the model's history

Announcement

January 21, 2025

Last Update

July 19, 2025

Today

May 9, 2026

Technical Specifications

Parameters

Training Tokens

Knowledge Cutoff

August 1, 2024

Family

Capabilities

MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

Reasoning

Logical reasoning and analysis

GPQA

Challenging science questions requiring chain-of-thought reasoning AI systems have made tremendous strides in answering factual questions, but complex science problems that require multi-step reasoning and domain knowledge remain challenging. This task involves a set of science questions from various domains (physics, chemistry, biology, etc.) that require the model to: 1. Break down complex problems into logical steps 2. Apply scientific principles and formulas correctly 3. Reason through each step sequentially 4. Show calculations when necessary 5. Arrive at accurate conclusions The questions are designed to test both factual knowledge and the ability to use that knowledge in a logical reasoning chain. For example, a physics problem might require calculating forces, then using those values to determine if an object will move, and finally explaining the real-world implications. Success on this task requires not just memorized facts but the ability to connect concepts across domains and apply them appropriately in novel scenarios - mirroring how human experts solve scientific problems. • Self-reported

74.2%

Multimodal

Working with images and visual data

MMMU

Questions and answers by and in various fields AI: Translate, following text about method analysis model AI: # Experiment: LMSYS Olympiad ## Motivation LMSYS has run a series of "Olympiad" competitions where they crowdsource head-to-head comparisons between two AI assistants. This produces a win-rate tournament. In our own comparisons, we found a significant contrast between Olympiad win rates and benchmark performance. ## Procedure We analyze results from the LMSYS Olympiad in Arena (March 2024 Leaderboard). We focus on this leaderboard because it includes all of the current leading commercial and open models (e.g., Claude 3, GPT-4, and Llama 3). We download the full set of head-to-head win rates and convert these into an overall win-rate ranking (accounting for the fact that not all models played against each other the same number of times). ## Results We find a strong disconnect between LMSYS Olympiad rankings and benchmark rankings. For instance, Claude 3 Opus (which sits near the top of most capability benchmarks) is ranked #5 in the Olympiad, below GPT-4 Turbo, Claude 3 Sonnet, and even Claude 2. Llama 3 70B Instruct has a particularly weak showing, placing far below much smaller models like Mistral 7B. A notable issue with the Olympiad is that many votes come from deliberately adversarial prompts, which makes sense given that the crowdsourced voters are incentivized to try to find edge cases where models differ. This means that model behaviors like refusing to respond to certain prompts could have an outsized impact on these rankings. We found several examples where Claude 3 Opus declined to answer questions that other models answered, and this appeared to frequently lead human voters to prefer the more compliant model • Self-reported

75.4%

Other Tests

Specialized benchmarks

AIME 2024

# reasoning at solving mathematical tasks level AI assistant • Self-reported

73.3%

License & Metadata

License

proprietary

Announcement Date

January 21, 2025

Last Updated

July 19, 2025

Similar Models

All Models

Gemini 1.5 Pro

Google

Best score:0.9 (MMLU)

Released:May 2024

Price:$2.50/1M tokens

Gemini 1.5 Flash

Google

Best score:0.8 (MMLU)

Released:May 2024

Price:$0.15/1M tokens

Gemini 2.5 Flash-Lite

Google

Best score:0.6 (GPQA)

Released:Jun 2025

Price:$0.10/1M tokens

Gemini 2.0 Flash

Google

Best score:0.6 (GPQA)

Released:Dec 2024

Price:$0.10/1M tokens

Gemini 2.0 Flash-Lite

Google

Best score:0.5 (GPQA)

Released:Feb 2025

Price:$0.07/1M tokens

Gemini 3.1 Pro

Google

Best score:0.9 (GPQA)

Released:Feb 2026

Price:$2.50/1M tokens

Gemini 2.5 Pro

Google

Best score:0.8 (GPQA)

Released:May 2025

Price:$1.25/1M tokens

Gemini 2.5 Pro Preview 06-05

Google

Best score:0.9 (GPQA)

Released:Jun 2025

Price:$1.25/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.