Alibaba logo

Qwen2.5-Coder 7B Instruct

Alibaba

Qwen2.5-Coder is a specialized coding model trained on 5.5 trillion tokens of code data, supporting 92 programming languages with a 128K context window. It excels at code generation, completion, and fixing while maintaining high performance in math and general tasks. The model demonstrates exceptional capabilities in multi-language programming tasks and code reasoning.

Key Specifications

Parameters
7.0B
Context
-
Release Date
September 19, 2024
Average Score
58.0%

Timeline

Key dates in the model's history
Announcement
September 19, 2024
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
7.0B
Training Tokens
5.5T tokens
Knowledge Cutoff
-
Family
-
Fine-tuned from
qwen-2.5-7b-instruct
Capabilities
MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
HellaSwag
accuracySelf-reported
76.8%
MMLU
accuracySelf-reported
67.6%
TruthfulQA
accuracySelf-reported
50.6%
Winogrande
accuracySelf-reported
72.9%

Programming

Programming skills tests
HumanEval
pass@1 This method evaluates efficiency model, prompt total one times and correctness answer. pass@1 gives evaluation: 1, if answer correct, and 0, if no. This way evaluation, not execution for each exampleSelf-reported
88.4%
MBPP
pass@1 Pass with first attempts (pass@1) means, that model should solve task correctly with first attempts, when she/it task first. In difference from other such how pass@k, at pass@1 model not has capabilities generate several attempts solutions with choice best answer. This strict metric, so how she/it measures ability model find correct solution with first times. When evaluation pass@1 conclusions model usually on match answer with using or specialized High score pass@1 indicates on then, that model understanding domain field and can exact solutions without necessity in additional attempts orSelf-reported
83.5%

Mathematics

Mathematical problems and computations
GSM8k
accuracySelf-reported
83.9%
MATH
accuracySelf-reported
46.6%

Other Tests

Specialized benchmarks
Aider
pass@1 We we determine "pass@1" how probability that, that model correct answer with first attempts. Some model can use several attempts for solutions one and that indeed tasks (for example, with help sample and rating, such how majority voting), that can improve performance, but for this metrics we we consider only one attemptSelf-reported
55.6%
ARC-C
accuracySelf-reported
60.9%
BigCodeBench
accuracySelf-reported
41.0%
CRUXEval-Input-CoT
accuracySelf-reported
56.5%
CRUXEval-Output-CoT
accuracySelf-reported
56.0%
LiveCodeBench
pass@1 Method measurement efficiency first for tasks, models artificial intelligence. This score measures proportion or percentage correct answers, model with first attempts, without preliminary iterations or for evaluation base abilities model find correct solution immediately, that has value how for efficiency, so and for application. High score pass@1 indicates on then, that model and reasoning for solutions tasks without necessity in several attempts or additional that makes her/its more and practically in real scenariosSelf-reported
18.2%
MMLU-Base
accuracySelf-reported
68.0%
MMLU-Pro
accuracySelf-reported
40.1%
MMLU-Redux
accuracySelf-reported
66.6%
STEM
accuracySelf-reported
34.0%
TheoremQA
accuracySelf-reported
34.0%

License & Metadata

License
apache_2_0
Announcement Date
September 19, 2024
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.