Key Specifications
Parameters
7.6B
Context
-
Release Date
July 23, 2024
Average Score
59.5%
Timeline
Key dates in the model's history
Announcement
July 23, 2024
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
7.6B
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
Accuracy AI: for translation: Translation "Accuracy" how "Accuracy" matches in field machine training and artificial intelligence on language • Self-reported
Programming
Programming skills tests
HumanEval
Pass@1 Metric Pass@1 measures probability that, that solution will correct with first attempts. In difference from metrics Pass@k, which gives model k attempts, metric Pass@1 provides model only one attempt. High score Pass@1 means, that model can generate correct solutions without necessity do several attempts. This important for real applications, where users usually on first answer and not have capabilities verify several options. For computation Pass@1 is evaluated, solves whether attempt model task correctly. This can make with help (for example, execution code) or by means of comparison with reference answers. Metric Pass@1 especially useful for evaluation models, in context, when important reliability first answer, for example, in or decision-making solutions • Self-reported
MBPP
Pass@1 Metric Pass@1 is evaluation performance model, probability that, that model attempt solutions tasks this percentage tasks, which model can solve with first attempts. metric especially important for evaluation abilities model perform tasks without necessity attempts or iterations. High score Pass@1 about reliability model and her/its abilities provide exact results without additional attempts. Pass@1 often is used in benchmarks programming and mathematical tasks, where can determine correctness solutions. This metric gives more evaluation real capabilities model, than metrics, attempts, such how Pass@k for k > 1 • Self-reported
Mathematics
Mathematical problems and computations
GSM8k
Accuracy
AI • Self-reported
MATH
Accuracy AI: ChatGPT (GPT-4o) this assignment He simple, translation "Accuracy" how "Accuracy", that is correct in context evaluation models AI. Translation matches all not information, and Answer not contains quotes or other • Self-reported
Reasoning
Logical reasoning and analysis
GPQA
Accuracy
AI • Self-reported
Other Tests
Specialized benchmarks
AlignBench
Evaluation AI: I task and its reasoning, evaluating step for step. Human: solution from 0 to 10, where 0 means fully solution with errors, and 10 — fully correct solution. not only answer, but and method and justification. and solutions, errors, if they is, and that can was would improve • Self-reported
C-Eval
Accuracy We we evaluate accuracy solutions LLM for tasks on level competitions by mathematics. When this possible, we each task such manner, in order to have specific or answer. This allows us automatically evaluate answers model, usually match answer solving. For tasks with several answers (for example, where is required answer in form) we we verify solutions LLM In given work we in mainly we evaluate accuracy on tasks level competitions. We on sets data AIME and FrontierMath, and also on tasks from Harvard-MIT Mathematics Tournament (HMMT) and other competitions. These tasks clearly specific correct answers, evaluation • Self-reported
EvalPlus
Pass@1 This score efficiency model AI in solving problems generation code. He indicates percentage tasks, which model can solve with first attempts. When Pass@1 model performs n attempts for each tasks and verifies, how many tasks have although would one correct solution. Then is applied probability that, that model will solve task with first attempts. : Pass@1 = 1 - (1 - c/n)^n, where c — number correct solutions among n attempts. This method evaluation models, from or size model. Pass@1 score performance in field generation code, for comparison various models • Self-reported
LiveCodeBench
## Evaluation Evaluations on basis : 1. **Match task**: How well well solution suits to task. whether it understanding model and problems. 2. ****: Correctness computations, explanations and reasoning. All computation should be and conclusions should be for logical errors or errors in computations. 3. **solutions**: How well and solution. whether course thoughts model. whether conclusions. solution with steps, which one from score. 4. ****: Quality answer in whole, including understanding and approach to solving tasks. For each is used from 1 to 5: - 1: 2: 3: 4: Good - 5: General evaluation — this average evaluations by criteria, to numbers • Self-reported
MMLU-Pro
Accuracy AI: access to this is evaluation that, how access to on reasoning language models. We how LLM use various tools for solutions tasks and how well this improves their performance. For this we developed new set tasks from different fields. Tasks so, in order to be for LLM, but sufficiently complex, in order to tool use for obtaining results. Each task is evaluated by which various aspects abilities model to reasoning. results: - LLM significantly from access to for majority tasks - Efficiency tool use in degree depends from specific tasks - that some model actually show results for specific tasks at to between various models in their abilities effectively tool use We we consider, that this gives understanding capabilities and limitations tools for improvements reasoning LLM • Self-reported
MT-Bench
**Evaluation** LLM-TinyStories-Eval for creation benchmarks has system, on Flesch Reading Ease (FRE), which evaluates by scale from 0 to 100. values indicate on text. TinyStories has FRE 94.47, that matches 8-9 For analysis models we we use GPT-4 in capacity for evaluation two type evaluations: 1. **(0-5)**: evaluation means, that and 2. **/(0-5)**: how well suits for on level initial school. evaluation means use simple words, and • Self-reported
MultiPL-E
Pass@1 Metric Pass@1 measures, which percentage test cases model can solve with first attempts. More values performance. In difference from other methods, such how several solutions in parallel and choice most (self-consistency) or various options prompts, Pass@1 evaluates ability model generate correct answer immediately, without several attempts. This especially for real scenarios, where users correct solutions without necessity queries or several computations • Self-reported
TheoremQA
Accuracy
AI: LaMDA vs. Claude • Self-reported
License & Metadata
License
apache_2_0
Announcement Date
July 23, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsQwen2.5 7B Instruct
Alibaba
7.6B
Best score:0.8 (HumanEval)
Released:Sep 2024
Price:$0.30/1M tokens
Qwen3.5 9B
Alibaba
9.0B
Released:Mar 2026
Qwen2.5-Coder 7B Instruct
Alibaba
7.0B
Best score:0.9 (HumanEval)
Released:Sep 2024
Qwen3-235B-A22B-Instruct-2507
Alibaba
235.0B
Best score:0.8 (GPQA)
Released:Jul 2025
Price:$0.15/1M tokens
Qwen3 Max
Alibaba
Best score:0.6 (GPQA)
Released:Dec 2025
Qwen2.5-Omni-7B
Alibaba
MM7.0B
Best score:0.8 (HumanEval)
Released:Mar 2025
Llama 3.1 8B Instruct
Meta
8.0B
Best score:0.8 (ARC)
Released:Jul 2024
Price:$0.20/1M tokens
Ministral 8B Instruct
Mistral AI
8.0B
Best score:0.7 (ARC)
Released:Oct 2024
Price:$0.10/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.