Google logo

Gemini 2.5 Flash-Lite

Multimodal
Google

Gemini 2.5 Flash-Lite is a model developed by Google DeepMind, designed for a variety of tasks including reasoning, science, math, code generation, and more. It features advanced multilingual performance and long-context understanding capabilities. The model is optimized for low-latency use cases and supports multimodal input with a 1 million token context window.

Key Specifications

Parameters
-
Context
1.0M
Release Date
June 17, 2025
Average Score
40.8%

Timeline

Key dates in the model's history
Announcement
June 17, 2025
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
-
Training Tokens
-
Knowledge Cutoff
January 1, 2025
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$0.10
Output (per 1M tokens)
$0.40
Max Input Tokens
1.0M
Max Output Tokens
65.5K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
Arc
In time no method for that, how language model (LLM) solve tasks with help mode thinking. We discovered, that comparison intermediate reasoning not allows fully understand key process solutions at LLM. Without representations about that, how model solves task, strategy improvements performance. For solutions this problems we developed method research (Interactive Exploration), at which in process solutions tasks model, in her/its reasoning. This includes: 1. in process reasoning, in order to errors 2. Application for verification 3. thinking model with This methodology three key performance model: - knowledge: What model actually reasoning: Ability model apply its knowledge for solutions tasks - thinking: Efficiency use context solutions for correct application knowledge approach to allows how general patterns, so and errors, which not at analysis. He also helps for improvements specificSelf-reported
2.5%

Programming

Programming skills tests
SWE-Bench Verified
with one In given benchmark we ability model code, which solves task. We model question and we ask her/its code for solutions tasks. Then code and data with In difference from other benchmarks coding, which model should obtain correct result with first attempts. We also model include logic for verification its results before if she/it this This benchmark in where model has access to following : 1. : code for solutions problems. 2. : code on correctness, errors. 3. : all and solution. Model should explicitly between various aspects its thinkingSelf-reported
31.6%

Reasoning

Logical reasoning and analysis
GPQA
Diamond Diamond - this verification for improvement accuracy answers language models, which evaluates several solutions for identification correct answer. Process includes in itself solutions for problems, solutions by means of evaluation all and choice most correct solutions. how works Diamond: 1. solutions: set independent solutions tasks. 2. comparison: Each solutions in order to determine, which from them correct. 3. : On basis solutions 4. : solution with Diamond can how with one, so and with several models, that allows use its for improvements performance systems. especially efficient for tasks, requiring step for step reasoning, such how mathematical tasks, and allows model correct errors in its reasoningSelf-reported
64.6%

Multimodal

Working with images and visual data
MMMU
thinking AI: First all details on or mathematical that you and context, if he Then, in order to answer on question: 1. which task (for example, solution equations, verification evidence, explanation concepts) 2. task on logical steps 3. each step its reasoning 4. all intermediate computation and its results 5. final answer clearly and If on is mathematical expressions, their exactly and showing all stepsSelf-reported
72.9%

Other Tests

Specialized benchmarks
Aider-Polyglot
code AI models increasingly help debug code, make improvements, or implement features from natural language specifications. Code editing evaluates the ability to transform a given piece of code according to specific requirements. Basic aspects of code editing include: - Debugging: Fixing syntax or logical errors in code - Refactoring: Improving code structure without changing functionality - Implementing features: Adding new functionality according to specifications - Code transformation: Converting code between languages or frameworks Advanced aspects include handling complex codebases with multiple files and dependencies, understanding broader architectural implications, and making changes that respect existing patterns and standards. Evaluation methods: - Functional correctness: Does the edited code perform as specified? - Test passing rate: Does the edited code pass all test cases? - Code quality: Is the edited code efficient, maintainable, and following best practices? - Minimal modifications: Does the model make only necessary changes? Typical tasks involve providing code with a description of desired changes. The model must understand both the code's current structure and the requirements for modification. AI: code Models artificial intelligence all code, improvements or functions on basis on language. code evaluates ability code in with aspects code include: - : or logical errors in code - : improvement structure code without functions: addition new code: code between or aspects include work with complex with several and understanding more and and evaluation: - correctness: performs whether code ? - tests: passes whether code all test cases? - Quality code: is whether code and ? - : whether model only necessary ? tasks include provision code with Model should understand how structure code, so and toSelf-reported
26.7%
AIME 2025
Standard evaluationSelf-reported
49.8%
FACTS Grounding
accuracy AI: Factuality is definitely a key aspect I consider when evaluating my responses. I check my facts carefully to ensure I'm providing accurate information. When I'm unsure about something, I try to be transparent about that uncertainty rather than presenting speculation as fact. I also avoid making definitive claims on topics where there's significant debate or where the facts are still evolving. One strategy I use is carefully distinguishing between well-established facts, expert consensus, emerging research, and speculative ideas. I'm especially careful with sensitive topics like health information, scientific claims, historical events, and statistical data. If I realize I've made a factual error, I acknowledge it directly and provide the correct information. I believe maintaining factual accuracy is essential for being helpful and trustworthySelf-reported
84.1%
Global-MMLU-Lite
performance AI: *Translation with on Russian on presented.*Self-reported
81.1%
Humanity's Last Exam
# We we verify, how well well model errors in in difference from correct solutions. ability indicates on then, that model has more understanding domain field. ## Description tasks Method evaluation differs from standard tests, ability models solve mathematical tasks: 1. We we provide model solution (which can be correct or incorrect) 2. model evaluate, correctly whether solution 3. If solution model should error 4. If solution correct, model should this ## Application This task better matches use models AI, when users its solutions and connection. We we use tasks from benchmark MATH, providing model: - solutions from set data - incorrect solutions ## results tests with Claude 3 Opus and GPT-4 show, that model: - incorrect solutions for correct - correct solutions - errors in improvements to ## and Ability correct and incorrect solutions matches more domain field, in difference from or This benchmark also allows: 1. sufficiently whether well model "understands" solution, in order to find in errors 2. can whether model detect errors in complexity 3. not whether model in decision-making solutions (), especially if they but contain errorsSelf-reported
5.1%
LiveCodeBench
code AI ## Task: code On basis several code and errors, solution. ## Method: through analysis 1. **analysis**: code and about 2. **problems**: where specifically error and why. 3. **solutions**: code, which: - problems - programming - code 4. **solutions**: code that he will work in conditions. ## Limitations - only part, maintaining general structure and code - language programming code - If is required context, this in formSelf-reported
33.7%
MRCR v2
Long context 128k average. 8Self-reported
16.6%
SimpleQA
Factual accuracy AI: Despite on their capabilities, LLM from "" - they sometimes statements, which but actually This usually evaluate by means of verification answers on questions. However existing benchmarks often have limitations: answers can be from training data, questions can be with help search, or they can be knowledge. Evaluation models on general knowledge, which should human with (but not ), can give information about abilities model answer on questions. For example, we we can model about in or mainSelf-reported
10.7%
Vibe-Eval
Reka AI: RekaSelf-reported
51.3%

License & Metadata

License
creative_commons_attribution_4_0_license
Announcement Date
June 17, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.