Key Specifications
Parameters
8.0B
Context
1.0M
Release Date
March 15, 2024
Average Score
60.5%
Timeline
Key dates in the model's history
Announcement
March 15, 2024
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
8.0B
Training Tokens
-
Knowledge Cutoff
October 1, 2024
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$0.07
Output (per 1M tokens)
$0.30
Max Input Tokens
1.0M
Max Output Tokens
8.2K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
Mathematics
Mathematical problems and computations
MATH
Accuracy in solving mathematical tasks We we evaluate mathematical abilities models, using set assignments, tasks by mathematics level, such how American Invitational Mathematics Examination (AIME) and assignments from USA Mathematical Olympiad (USAMO). These tasks require deep understanding mathematical concepts, thinking and approach to solving. We we compare answers model with reference solutions, using criteria evaluation: - Correctness answer - Accuracy mathematical reasoning in each step - Use mathematical methods - and explanations Evaluation how automatically, so and with experts-for complex cases. We also we analyze errors, which makes model, and their by (errors, computational errors, logical errors). Using these metrics, we we can determine, how well model capable solve tasks, requiring mathematical knowledge, and also progress in this field by comparison with and other models • Self-reported
Reasoning
Logical reasoning and analysis
GPQA
Accuracy answers on scientific questions, experts AI: I model on 100 selected questions from set Google-proof Questions and Answers (GPQA), from 600 scientific questions, experts for verification knowledge models. Questions GPQA so, in order to actual accuracy and which model can use for answer on questions. These questions various scientific including and I 0 at evaluation answers model. Then each answer with reference answer, GPQA, in order to determine its accuracy. Answer correct, if he answer by even if • Self-reported
Multimodal
Working with images and visual data
MathVista
Accuracy reasoning AI: Good, I text about accuracy reasoning • Self-reported
MMMU
Accuracy understanding AI: on questions about and other visual : - model images with () - questions, requiring understanding accuracy answers and ability information Strong results: - Exact description key images - and numerical data on Understanding on and and information for answers results: - in text on or exactly numerical data - which no on answers on questions, requiring understanding • Self-reported
Other Tests
Specialized benchmarks
FLEURS
Accuracy (1 - WER) • Self-reported
HiddenMath
Accuracy solutions mathematical tasks level AI's are measured on their ability to solve challenging math problems selected from prestigious competitions such as the AIME, FrontierMath, or the Harvard-MIT Mathematics Tournament. These problems typically require multi-step reasoning, creative application of mathematical concepts, and formal symbolic manipulation. Evaluation focuses on both the final answer accuracy and the correctness of the solution path, including intermediate steps and justifications. Problems may span various fields of mathematics, including algebra, number theory, geometry, and combinatorics. This benchmark is particularly valuable for assessing an AI's: - Formal reasoning capabilities - Understanding of mathematical concepts - Ability to organize complex, multi-step solutions - Mathematical precision and rigor Performance is often reported as the percentage of problems solved correctly, sometimes broken down by difficulty level or mathematical domain • Self-reported
MMLU-Pro
Accuracy at from several options answer in set data MMLU with tasks complexity • Self-reported
MRCR
Accuracy understanding context AI: We we measure ability model exactly answer on questions, using information, where-then in context. approach consists in that, in order to provide model and then it questions, answers on which in this We we verify, how well accuracy depends from: - where in information (in or end) - query (then direct query on information or query, and reasoning on basis information) We we measure two type accuracy: 1. Accuracy extraction: Can whether model find information in 2. Accuracy understanding: Can whether model make correct conclusions, on information in • Self-reported
Natural2Code
Indicator in tasks generation code on various languages programming AI: I analysis that, how well well model performs tasks by generation code on different languages programming, proportion successfully tasks by : Python, JavaScript, Java, C++, Go, Rust Method: 1. set from 20 tasks by programming for each language 2. Tasks will include: - tasks (search, ) - with data - with API - errors - 3. For each tasks: - at model solution - code on correctness through tests - /4. percentage for each language 5. errors and general patterns This evaluation abilities model generate code on different languages programming and specific strong and weak side • Self-reported
Vibe-Eval
Evaluation understanding information AI systems have made significant progress in visual perception and understanding. This evaluation tests the model's ability to accurately interpret visual content, reason about visual information, and answer questions based on visual inputs. The evaluation covers a range of tasks from basic image recognition to complex reasoning about visual scenes. Key capabilities tested include: 1. Basic object recognition and scene understanding 2. Spatial reasoning about object relationships 3. Action recognition in images 4. Understanding of visual attributes (color, size, shape) 5. Visual question answering 6. Complex reasoning based on visual input 7. Multi-frame or temporal reasoning 8. Fine-grained discrimination between similar visual concepts 9. Understanding of charts, diagrams, and other specialized visual formats The evaluation uses a diverse set of images, including natural photographs, illustrations, diagrams, charts, and specialized visualizations. Questions range from simple ("What objects are in this image?") to complex ("What logical inference can you make about the relationship between these elements?") • Self-reported
Video-MME
Accuracy analysis AI • Self-reported
WMT23
Evaluation quality translation For evaluation quality translation with on Russian we we use method evaluation: first we evaluate correctness, and then Correctness translation (5 points): 5 - Fully translation, all nuances and accuracy text. 4 - In whole translation with errors, not on understanding. 3 - translation with several errors, on understanding. 2 - Translation with errors, significantly 1 - translation with errors, fully text. translation (5 points): 5 - how text on language, 4 - In whole translation with 3 - but in whole text. 2 - expressions, how translation. 1 - translation, translation with evaluation quality translation - points for correctness and (10 points) • Self-reported
XSTest
execution queries • Self-reported
License & Metadata
License
proprietary
Announcement Date
March 15, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsGemma 3n E2B Instructed
MM8.0B
Best score:0.7 (HumanEval)
Released:Jun 2025
Gemma 3n E2B
MM8.0B
Best score:0.5 (ARC)
Released:Jun 2025
MedGemma 4B IT
MM4.3B
Released:May 2025
Gemma 3 4B
MM4.0B
Best score:0.7 (HumanEval)
Released:Mar 2025
Price:$0.02/1M tokens
Gemma 3n E2B Instructed LiteRT (Preview)
MM1.9B
Best score:0.7 (HumanEval)
Released:May 2025
Gemma 3n E4B Instructed
MM8.0B
Best score:0.8 (HumanEval)
Released:Jun 2025
Price:$20.00/1M tokens
Gemma 3n E4B Instructed LiteRT Preview
MM1.9B
Best score:0.8 (HumanEval)
Released:May 2025
Gemma 3n E4B
MM8.0B
Best score:0.6 (ARC)
Released:Jun 2025
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.