Google logo

Gemini 2.0 Flash-Lite

Multimodal
Google

Gemini 2.0 Flash model optimized for cost efficiency and low latency

Key Specifications

Parameters
-
Context
1.0M
Release Date
February 5, 2025
Average Score
59.0%

Timeline

Key dates in the model's history
Announcement
February 5, 2025
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
-
Training Tokens
-
Knowledge Cutoff
June 1, 2024
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$0.07
Output (per 1M tokens)
$0.30
Max Input Tokens
1.0M
Max Output Tokens
8.2K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Mathematics

Mathematical problems and computations
MATH
Standard AI: Translate textSelf-reported
86.8%

Reasoning

Logical reasoning and analysis
GPQA
Diamond In context systems artificial intelligence Diamond - this approach to abilities models solve complex tasks. Diamond evaluates capabilities model by means of provision tasks and then information, how would "prompts" or "" to solving. Each prompt allows model on problem with new For example, model can be complex task. If she/it not can her/its solve, prompt, for example, problem with using equations. If model still not can solve task, is provided still one prompt, for example, on specific step or This approach useful for: 1. Evaluations knowledge and skills model 2. that, which level prompts for solutions tasks 3. different models by their abilities solve tasks with different prompts Diamond also can identify, how model use information and how well they new prompts in its process solutionsSelf-reported
51.5%

Multimodal

Working with images and visual data
MMMU
multimodal tasks on understanding and reasoning levelSelf-reported
68.0%

Other Tests

Specialized benchmarks
Bird-SQL (dev)
# Evaluation We model GPT-4o on set complex tasks and we evaluate her/its performance with help and methods evaluation. fields, in which model demonstrates significant improvements or limitations. ## evaluation We GPT-4o on several tests, results with models GPT, and also with other models, how Claude and Gemini. These evaluation include: - **benchmarks**: MMLU, HumanEval, GPQA and other tests. - **tasks**: from competitions, such how AIME and FrontierMath. - **Reasoning on language**: Tasks logical output and understanding context. - **Multimodal processing**: on images, and ## evaluation samples answers GPT-4o, their with other models. This evaluation includes: - **Accuracy**: Correctness facts and logical conclusions. - ****: answers for users. - **domain field**: knowledge in specialized fields. - ****: Ability generate solutions and ****: How well well model should instructions and to ## limitations We known limitations previous models, in order to determine, were whether they in GPT-4o: - ****: How well often model generates information. - **errors**: Accuracy in complex and tasks. - **knowledge**: information and about **to **: to limitations. - ****: Ability or These evaluation us representation about capabilities andSelf-reported
57.4%
CoVoST2
translation (score BLEU) on 21 language AI: Self-evaluate using an automatic translation benchmark called BLEU. For each language pair, the AI must translate 100 short sentences. Scores are normalized from 0-100 based on comparison with human reference translations. Method details: - Each language pair (e.g., English→Japanese) includes both common and technical sentences - Equal weighting across all language pairs regardless of difficulty - Proper handling of non-Latin scripts and dialects is essential - Automatic scoring prevents "gaming" the benchmark - Evaluation corpus spans topics including medical, legal, technical and casual conversation Key languages evaluated include: Arabic, Bengali, Chinese, English, French, German, Hindi, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Swahili, Tagalog, Tamil, Thai, Turkish, Ukrainian, and VietnameseSelf-reported
38.4%
EgoSchema
Analysis in several subject fields AI: Translate following text: # PIC2TEXT: LEVERAGING VISION ENCODERS FOR PROGRAMMING IN CONTEXT Large language models (LLMs) have shown remarkable capabilities in many domains, including program synthesis. However, when it comes to understanding and manipulating images, these models face significant limitations. In this paper, we present PIC2TEXT, a novel approach to convert multimodal input (text and images) into a purely textual format, allowing us to leverage the full power of LLMs for programming with image inputs. PIC2TEXT uses a high-quality vision encoder to translate images into detailed textual descriptions, which are then combined with the original text to form a unified textual representation. This transformation enables LLMs to understand and reason about visual content in the context of programming tasks, without requiring multimodal training. Through extensive experiments on programming tasks involving image processing, scene understanding, and object recognition, we demonstrate that PIC2TEXT significantly enhances LLMs' ability to write code that correctly processes image inputs. Our results show that this approach enables LLMs to tackle a variety of image-related programming tasks across Python, JavaScript, and other languages, opening up new possibilities for image-based programming assistanceSelf-reported
67.2%
FACTS Grounding
- evaluationSelf-reported
83.6%
Global-MMLU-Lite
0-shot evaluation AI: 0-shot means, that we not we provide model examples that, how perform task, before that how it give assignment. can use 0-shot evaluation for measurement abilities model to generalization, when she/it with new task, which not This can be verify, indeed whether model understands task or simply solutions from data, on which she/itSelf-reported
78.2%
HiddenMath
0-shot evaluation AI : It's interesting that you specify no prompt for GPQA or MATH, since I know they usually involve multi-step reasoning. Let me work through this carefully. For zero-shot evaluation on GPQA and MATH datasets: 1. I'll receive questions without any specific instructions or examples. 2. For each question, I need to: - Understand what's being asked - Formulate a clear solution approach - Work through the problem step-by-step - Verify my reasoning - Provide a final answer in the expected format For GPQA specifically, which contains graduate-level physics questions, I'll need to: - Apply fundamental physics principles - Perform mathematical derivations when needed - Use appropriate formulas and theories - Express answers with correct units and precision For MATH problems, which include competition-style mathematics: - Identify the mathematical domains involved - Apply relevant theorems and techniques - Show complete work/derivation - Verify solutions through cross-checking I'll maintain clarity in my reasoning and ensure answers are precise and well-justified, even without specific prompting instructions.Self-reported
55.3%
LiveCodeBench v5
# Pass@1 solutions tasks for one attempt. This metric is one from main ways measurement abilities model solve tasks. Pass@1 (also known how "accuracy first attempts") indicates, which percentage tasks model solves correctly at generation one answer. Since answers model often contain Pass@1 can measure, several attempts on each task and proportion correct solutions. In for this usually is used evaluation pass@k: Pass@1 = 1 - (1 - c/n)^k where c — number correct solutions among n attempts, and k = 1. Indicator Pass@1 is metric for tasks, including mathematical and scientific puzzles and tasks with clearly correctness answerSelf-reported
28.9%
MMLU-Pro
Accuracy chains reasoning This method evaluates intermediate steps reasoning model, and not only final answer. Each step should be correct for obtaining solutions. Advantages: - more analysis abilities model to reasoning - where specifically errors in reasoning - model to its process Disadvantages: - evaluation; often is required verification experts - in evaluation each step - different approaches to solving one and that indeed problems Examples application: - Evaluation mathematical tasks, requiring solutions - puzzles, where important sequence output - Tasks programming, where necessary track logic codeSelf-reported
71.6%
MRCR 1M
Accuracy understanding context AI: (>50K tokens), such how or scientific : questions, with in different Method evaluation: model answer on these questions. Then answers with : - questions complexity, from simple facts to those, that require information from different ability model find information in accuracy at numerical data and from text - understanding bySelf-reported
58.0%
SimpleQA
Factual accuracy AI: understanding not relying on on : verifies quality answer through : evaluate basic knowledge model, determine to and measure accuracy. Advantages: measures accuracy information, incorrect representations. Disadvantages: model can even at verification requires time. : model often demonstrate various actual accuracy in different fields knowledge. through importantSelf-reported
21.7%

License & Metadata

License
proprietary
Announcement Date
February 5, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.