Qwen2.5-Coder 32B Instruct
Qwen2.5-Coder is a specialized coding model trained on 5.5 trillion tokens of code data, supporting 92 programming languages with a 128K token context window. The model excels at code generation, autocomplete, bug fixing, and multilingual programming tasks while maintaining high performance in math and general tasks.
Key Specifications
Parameters
32.0B
Context
128.0K
Release Date
September 19, 2024
Average Score
64.9%
Timeline
Key dates in the model's history
Announcement
September 19, 2024
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
32.0B
Training Tokens
5.5T tokens
Knowledge Cutoff
-
Family
-
Fine-tuned from
qwen-2.5-32b-instruct
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$0.09
Output (per 1M tokens)
$0.09
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
HellaSwag
accuracy • Self-reported
MMLU
accuracy • Self-reported
TruthfulQA
accuracy • Self-reported
Winogrande
accuracy • Self-reported
Programming
Programming skills tests
HumanEval
pass@1 Method verifies, capable whether model solve task with first attempts. Task is considered if answer correct at If answer incorrect, task is considered additional question or attempts solutions on several examples not Model or handles with task with first attempts, or not handles. Metric not accounts for possible improvements at additional attempts and reflects base ability model solve tasks without approach • Self-reported
MBPP
pass@1 This metric evaluates probability that, that model correct answer with first attempts, then is in its answer. For each tasks model receives one if her/its first answer contains correct solution, and in case. score represents itself average value by all tasks. In difference from other metrics, such how pass@k, which evaluate probability correct answer among several attempts, pass@1 evaluates ability model find correct solution with first times. This strict since he not allows no/none errors in process solutions. pass@1 is metric for evaluation reliability and accuracy model in tasks, where important correctness, for example, in or where at users can not be capabilities or resources for verification several answers • Self-reported
Mathematics
Mathematical problems and computations
GSM8k
accuracy • Self-reported
MATH
accuracy • Self-reported
Other Tests
Specialized benchmarks
ARC-C
accuracy • Self-reported
BigCodeBench-Full
accuracy • Self-reported
BigCodeBench-Hard
accuracy • Self-reported
LiveCodeBench
pass@1 — this way measurement performance model, when it is provided only one attempt. This standard score, for evaluation model on tasks, which require exact answer. For example, if model answers on 75 from 100 questions correctly with first attempts, then score pass@1 will 75%. In difference from other metrics, such how pass@k (where model generates k different answers and is considered if although would one correct), pass@1 ability model find correct answer with first attempts. This more strict measure, since model not receives several This score especially important for applications, where usually is required one specific answer, and not several options • Self-reported
MMLU-Pro
accuracy • Self-reported
MMLU-Redux
accuracy • Self-reported
TheoremQA
accuracy • Self-reported
License & Metadata
License
apache_2_0
Announcement Date
September 19, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsQwen2.5 32B Instruct
Alibaba
32.5B
Best score:0.9 (HumanEval)
Released:Sep 2024
Qwen3 32B
Alibaba
32.8B
Released:Apr 2025
Price:$0.40/1M tokens
Qwen3.5 27B
Alibaba
27.0B
Released:Mar 2026
Qwen3.5 35B A3B
Alibaba
35.0B
Released:Mar 2026
Qwen3-Next-80B-A3B-Instruct
Alibaba
80.0B
Released:Sep 2025
Price:$0.15/1M tokens
Qwen2 72B Instruct
Alibaba
72.0B
Best score:0.9 (HumanEval)
Released:Jul 2024
Qwen2.5 14B Instruct
Alibaba
14.7B
Best score:0.8 (HumanEval)
Released:Sep 2024
Qwen2.5-Coder 7B Instruct
Alibaba
7.0B
Best score:0.9 (HumanEval)
Released:Sep 2024
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.