Gemini 2.0 Flash Thinking
MultimodalGemini 2.0 Flash Thinking is an enhanced reasoning model capable of showing its thought processes for improved performance and explainability. Combining speed and performance, Gemini 2.0 Flash Thinking also excels at science and math tasks, showing its reasoning when solving complex problems.
Key Specifications
Parameters
-
Context
-
Release Date
January 21, 2025
Average Score
74.3%
Timeline
Key dates in the model's history
Announcement
January 21, 2025
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
-
Training Tokens
-
Knowledge Cutoff
August 1, 2024
Family
-
Capabilities
MultimodalZeroEval
Benchmark Results
Model performance metrics across various tests and benchmarks
Reasoning
Logical reasoning and analysis
GPQA
Challenging science questions requiring chain-of-thought reasoning
AI systems have made tremendous strides in answering factual questions, but complex science problems that require multi-step reasoning and domain knowledge remain challenging. This task involves a set of science questions from various domains (physics, chemistry, biology, etc.) that require the model to:
1. Break down complex problems into logical steps
2. Apply scientific principles and formulas correctly
3. Reason through each step sequentially
4. Show calculations when necessary
5. Arrive at accurate conclusions
The questions are designed to test both factual knowledge and the ability to use that knowledge in a logical reasoning chain. For example, a physics problem might require calculating forces, then using those values to determine if an object will move, and finally explaining the real-world implications.
Success on this task requires not just memorized facts but the ability to connect concepts across domains and apply them appropriately in novel scenarios - mirroring how human experts solve scientific problems. • Self-reported
Multimodal
Working with images and visual data
MMMU
Questions and answers by and in various fields AI: Translate, following text about method analysis model AI: # Experiment: LMSYS Olympiad ## Motivation LMSYS has run a series of "Olympiad" competitions where they crowdsource head-to-head comparisons between two AI assistants. This produces a win-rate tournament. In our own comparisons, we found a significant contrast between Olympiad win rates and benchmark performance. ## Procedure We analyze results from the LMSYS Olympiad in Arena (March 2024 Leaderboard). We focus on this leaderboard because it includes all of the current leading commercial and open models (e.g., Claude 3, GPT-4, and Llama 3). We download the full set of head-to-head win rates and convert these into an overall win-rate ranking (accounting for the fact that not all models played against each other the same number of times). ## Results We find a strong disconnect between LMSYS Olympiad rankings and benchmark rankings. For instance, Claude 3 Opus (which sits near the top of most capability benchmarks) is ranked #5 in the Olympiad, below GPT-4 Turbo, Claude 3 Sonnet, and even Claude 2. Llama 3 70B Instruct has a particularly weak showing, placing far below much smaller models like Mistral 7B. A notable issue with the Olympiad is that many votes come from deliberately adversarial prompts, which makes sense given that the crowdsourced voters are incentivized to try to find edge cases where models differ. This means that model behaviors like refusing to respond to certain prompts could have an outsized impact on these rankings. We found several examples where Claude 3 Opus declined to answer questions that other models answered, and this appeared to frequently lead human voters to prefer the more compliant model • Self-reported
Other Tests
Specialized benchmarks
AIME 2024
# reasoning at solving mathematical tasks level AI assistant • Self-reported
License & Metadata
License
proprietary
Announcement Date
January 21, 2025
Last Updated
July 19, 2025
Similar Models
All ModelsGemini 1.5 Pro
MM
Best score:0.9 (MMLU)
Released:May 2024
Price:$2.50/1M tokens
Gemini 1.5 Flash
MM
Best score:0.8 (MMLU)
Released:May 2024
Price:$0.15/1M tokens
Gemini 2.5 Flash-Lite
MM
Best score:0.6 (GPQA)
Released:Jun 2025
Price:$0.10/1M tokens
Gemini 2.0 Flash
MM
Best score:0.6 (GPQA)
Released:Dec 2024
Price:$0.10/1M tokens
Gemini 2.0 Flash-Lite
MM
Best score:0.5 (GPQA)
Released:Feb 2025
Price:$0.07/1M tokens
Gemini 3.1 Pro
MM
Best score:0.9 (GPQA)
Released:Feb 2026
Price:$2.50/1M tokens
Gemini 2.5 Pro
MM
Best score:0.8 (GPQA)
Released:May 2025
Price:$1.25/1M tokens
Gemini 2.5 Pro Preview 06-05
MM
Best score:0.9 (GPQA)
Released:Jun 2025
Price:$1.25/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.