Alibaba logo

Qwen2.5-Coder 32B Instruct

Alibaba

Qwen2.5-Coder is a specialized coding model trained on 5.5 trillion tokens of code data, supporting 92 programming languages with a 128K token context window. The model excels at code generation, autocomplete, bug fixing, and multilingual programming tasks while maintaining high performance in math and general tasks.

Key Specifications

Parameters
32.0B
Context
128.0K
Release Date
September 19, 2024
Average Score
64.9%

Timeline

Key dates in the model's history
Announcement
September 19, 2024
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
32.0B
Training Tokens
5.5T tokens
Knowledge Cutoff
-
Family
-
Fine-tuned from
qwen-2.5-32b-instruct
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$0.09
Output (per 1M tokens)
$0.09
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
HellaSwag
accuracySelf-reported
83.0%
MMLU
accuracySelf-reported
75.1%
TruthfulQA
accuracySelf-reported
54.2%
Winogrande
accuracySelf-reported
80.8%

Programming

Programming skills tests
HumanEval
pass@1 Method verifies, capable whether model solve task with first attempts. Task is considered if answer correct at If answer incorrect, task is considered additional question or attempts solutions on several examples not Model or handles with task with first attempts, or not handles. Metric not accounts for possible improvements at additional attempts and reflects base ability model solve tasks without approachSelf-reported
92.7%
MBPP
pass@1 This metric evaluates probability that, that model correct answer with first attempts, then is in its answer. For each tasks model receives one if her/its first answer contains correct solution, and in case. score represents itself average value by all tasks. In difference from other metrics, such how pass@k, which evaluate probability correct answer among several attempts, pass@1 evaluates ability model find correct solution with first times. This strict since he not allows no/none errors in process solutions. pass@1 is metric for evaluation reliability and accuracy model in tasks, where important correctness, for example, in or where at users can not be capabilities or resources for verification several answersSelf-reported
90.2%

Mathematics

Mathematical problems and computations
GSM8k
accuracySelf-reported
91.1%
MATH
accuracySelf-reported
57.2%

Other Tests

Specialized benchmarks
ARC-C
accuracySelf-reported
70.5%
BigCodeBench-Full
accuracySelf-reported
49.6%
BigCodeBench-Hard
accuracySelf-reported
27.0%
LiveCodeBench
pass@1 — this way measurement performance model, when it is provided only one attempt. This standard score, for evaluation model on tasks, which require exact answer. For example, if model answers on 75 from 100 questions correctly with first attempts, then score pass@1 will 75%. In difference from other metrics, such how pass@k (where model generates k different answers and is considered if although would one correct), pass@1 ability model find correct answer with first attempts. This more strict measure, since model not receives several This score especially important for applications, where usually is required one specific answer, and not several optionsSelf-reported
31.4%
MMLU-Pro
accuracySelf-reported
50.4%
MMLU-Redux
accuracySelf-reported
77.5%
TheoremQA
accuracySelf-reported
43.1%

License & Metadata

License
apache_2_0
Announcement Date
September 19, 2024
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.