Gemini 1.5 Flash 8B

Name: Gemini 1.5 Flash 8B
Author: Google

Multimodal

Google

A multimodal model capable of processing audio, images, video, and text with high efficiency. Supports JSON mode, function calling, code execution, and system instructions. Optimized for fast inference with 8 billion parameters.

Key Specifications

Parameters

8.0B

Context

1.0M

Release Date

March 15, 2024

Average Score

60.5%

Timeline

Key dates in the model's history

Announcement

March 15, 2024

Last Update

July 19, 2025

Today

May 9, 2026

Technical Specifications

Parameters

8.0B

Training Tokens

Knowledge Cutoff

October 1, 2024

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$0.07

Output (per 1M tokens)

$0.30

Max Input Tokens

1.0M

Max Output Tokens

8.2K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Mathematics

Mathematical problems and computations

MATH

Accuracy in solving mathematical tasks We we evaluate mathematical abilities models, using set assignments, tasks by mathematics level, such how American Invitational Mathematics Examination (AIME) and assignments from USA Mathematical Olympiad (USAMO). These tasks require deep understanding mathematical concepts, thinking and approach to solving. We we compare answers model with reference solutions, using criteria evaluation: - Correctness answer - Accuracy mathematical reasoning in each step - Use mathematical methods - and explanations Evaluation how automatically, so and with experts-for complex cases. We also we analyze errors, which makes model, and their by (errors, computational errors, logical errors). Using these metrics, we we can determine, how well model capable solve tasks, requiring mathematical knowledge, and also progress in this field by comparison with and other models • Self-reported

58.7%

Reasoning

Logical reasoning and analysis

GPQA

Accuracy answers on scientific questions, experts AI: I model on 100 selected questions from set Google-proof Questions and Answers (GPQA), from 600 scientific questions, experts for verification knowledge models. Questions GPQA so, in order to actual accuracy and which model can use for answer on questions. These questions various scientific including and I 0 at evaluation answers model. Then each answer with reference answer, GPQA, in order to determine its accuracy. Answer correct, if he answer by even if • Self-reported

38.4%

Multimodal

Working with images and visual data

MathVista

Accuracy reasoning AI: Good, I text about accuracy reasoning • Self-reported

54.7%

MMMU

Accuracy understanding AI: on questions about and other visual : - model images with () - questions, requiring understanding accuracy answers and ability information Strong results: - Exact description key images - and numerical data on Understanding on and and information for answers results: - in text on or exactly numerical data - which no on answers on questions, requiring understanding • Self-reported

53.7%

Other Tests

Specialized benchmarks

FLEURS

Accuracy (1 - WER) • Self-reported

86.4%

HiddenMath

Accuracy solutions mathematical tasks level AI's are measured on their ability to solve challenging math problems selected from prestigious competitions such as the AIME, FrontierMath, or the Harvard-MIT Mathematics Tournament. These problems typically require multi-step reasoning, creative application of mathematical concepts, and formal symbolic manipulation. Evaluation focuses on both the final answer accuracy and the correctness of the solution path, including intermediate steps and justifications. Problems may span various fields of mathematics, including algebra, number theory, geometry, and combinatorics. This benchmark is particularly valuable for assessing an AI's: - Formal reasoning capabilities - Understanding of mathematical concepts - Ability to organize complex, multi-step solutions - Mathematical precision and rigor Performance is often reported as the percentage of problems solved correctly, sometimes broken down by difficulty level or mathematical domain • Self-reported

32.8%

MMLU-Pro

Accuracy at from several options answer in set data MMLU with tasks complexity • Self-reported

58.7%

MRCR

Accuracy understanding context AI: We we measure ability model exactly answer on questions, using information, where-then in context. approach consists in that, in order to provide model and then it questions, answers on which in this We we verify, how well accuracy depends from: - where in information (in or end) - query (then direct query on information or query, and reasoning on basis information) We we measure two type accuracy: 1. Accuracy extraction: Can whether model find information in 2. Accuracy understanding: Can whether model make correct conclusions, on information in • Self-reported

54.7%

Natural2Code

Indicator in tasks generation code on various languages programming AI: I analysis that, how well well model performs tasks by generation code on different languages programming, proportion successfully tasks by : Python, JavaScript, Java, C++, Go, Rust Method: 1. set from 20 tasks by programming for each language 2. Tasks will include: - tasks (search, ) - with data - with API - errors - 3. For each tasks: - at model solution - code on correctness through tests - /4. percentage for each language 5. errors and general patterns This evaluation abilities model generate code on different languages programming and specific strong and weak side • Self-reported

75.5%

Vibe-Eval

Evaluation understanding information AI systems have made significant progress in visual perception and understanding. This evaluation tests the model's ability to accurately interpret visual content, reason about visual information, and answer questions based on visual inputs. The evaluation covers a range of tasks from basic image recognition to complex reasoning about visual scenes. Key capabilities tested include: 1. Basic object recognition and scene understanding 2. Spatial reasoning about object relationships 3. Action recognition in images 4. Understanding of visual attributes (color, size, shape) 5. Visual question answering 6. Complex reasoning based on visual input 7. Multi-frame or temporal reasoning 8. Fine-grained discrimination between similar visual concepts 9. Understanding of charts, diagrams, and other specialized visual formats The evaluation uses a diverse set of images, including natural photographs, illustrations, diagrams, charts, and specialized visualizations. Questions range from simple ("What objects are in this image?") to complex ("What logical inference can you make about the relationship between these elements?") • Self-reported

40.9%

Video-MME

Accuracy analysis AI • Self-reported

66.2%

WMT23

Evaluation quality translation For evaluation quality translation with on Russian we we use method evaluation: first we evaluate correctness, and then Correctness translation (5 points): 5 - Fully translation, all nuances and accuracy text. 4 - In whole translation with errors, not on understanding. 3 - translation with several errors, on understanding. 2 - Translation with errors, significantly 1 - translation with errors, fully text. translation (5 points): 5 - how text on language, 4 - In whole translation with 3 - but in whole text. 2 - expressions, how translation. 1 - translation, translation with evaluation quality translation - points for correctness and (10 points) • Self-reported

72.6%

XSTest

execution queries • Self-reported

92.6%

License & Metadata

License

proprietary

Announcement Date

March 15, 2024

Last Updated

July 19, 2025

Similar Models

All Models

Gemma 3n E2B Instructed

Google

MM8.0B

Best score:0.7 (HumanEval)

Released:Jun 2025

Gemma 3n E2B

Google

MM8.0B

Best score:0.5 (ARC)

Released:Jun 2025

MedGemma 4B IT

Google

MM4.3B

Released:May 2025

Gemma 3 4B

Google

MM4.0B

Best score:0.7 (HumanEval)

Released:Mar 2025

Price:$0.02/1M tokens

Gemma 3n E2B Instructed LiteRT (Preview)

Google

MM1.9B

Best score:0.7 (HumanEval)

Released:May 2025

Gemma 3n E4B Instructed

Google

MM8.0B

Best score:0.8 (HumanEval)

Released:Jun 2025

Price:$20.00/1M tokens

Gemma 3n E4B Instructed LiteRT Preview

Google

MM1.9B

Best score:0.8 (HumanEval)

Released:May 2025

Gemma 3n E4B

Google

MM8.0B

Best score:0.6 (ARC)

Released:Jun 2025

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.