DeepSeek R1 Distill Qwen 32B
DeepSeek-R1 is a first-generation reasoning model built on DeepSeek-V3 (671 billion total parameters, 37 billion activated per token). It incorporates large-scale reinforcement learning (RL) to improve chain-of-thought reasoning and logical thinking capabilities, delivering high performance in math, coding, and multi-step reasoning tasks.
Key Specifications
Parameters
32.8B
Context
128.0K
Release Date
January 20, 2025
Average Score
74.2%
Timeline
Key dates in the model's history
Announcement
January 20, 2025
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
32.8B
Training Tokens
14.8T tokens
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$0.12
Output (per 1M tokens)
$0.18
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
Reasoning
Logical reasoning and analysis
GPQA
Diamond, Pass@1 For solutions tasks in GPQA we we use main approach, which I Diamond. Diamond - this solutions tasks with steps: 1. tasks: with analysis tasks, main and goal. On this I task on components, key determination, limitations and question. 2. strategies: various possible approaches to solving, considering tasks. I and methods, most for given problems. 3. solution: solutions, its on clearly specific stages. I sequence mathematical which necessary execute. 4. verification: solution with several view, cases and approaches for reliability. I task several in order to result. 5. Evaluation final answer: result on and correctness, its with I that answer matches tasks. 6. Independent : step I still times all process solutions from to how if would I first this task, in order to identify errors or This structured approach allows me and thoroughly solve complex tasks, probability errors. In I that independent in end critically important for identification errors • Self-reported
Other Tests
Specialized benchmarks
AIME 2024
Cons@64 AI: from context on basis prompts-chains Cons@64 should consider how for creation explanations. In difference from standard this technique thinking conclusions on more chains reasoning, solutions with high accuracy. Each step output should have 3 key : 1. context for step 2. step, on analysis 3. for to This technique supports reasoning on logical steps. Each output should directly on context, progress to solving. Cons@64 especially efficient for complex mathematical tasks, assignments by programming, logical evidence and other complex tasks, requiring step-by-step reasoning • Self-reported
LiveCodeBench
Pass@1 In context testing Large Language Models (LLM) metric Pass@1 measures proportion tasks, which model can solve with first attempts. This metric reflects ability model generate correct answer "with ", without necessity several attempts or iterations. Pass@1 especially important for evaluation performance models in real scenarios use, where usually exact answer with first times. High score Pass@1 indicates on reliability and accuracy model in solving tasks without necessity additional or When Pass@1 test cases are evaluated how "", if first model answer matches criteria success (for example, correctly solves task, gives answer on question or successfully performs function). This metric often is used in analysis LLM for tasks, requiring accuracy, such how mathematical computation, programming or logical reasoning • Self-reported
MATH-500
Pass@1 In tasks mathematical reasoning often useful measure, how many times model can solve problem with first attempts. This score Pass@1. Standard method evaluation Pass@1 set for each problems (for example, 100 or 1000) and these which successfully solve task. However this method requires computational resources and can be More method — use evaluation Pass@1 with number for example, with for each tasks. In this case: Pass@1 = Pass@k / k where Pass@k — this proportion solutions at k attempts. This method gives evaluation, but with For more evaluation with number can use method "self-consistency", when model generates several answers for one tasks, and then most often occurring answer. This approach can accuracy, especially when model in its correct answers and in • Self-reported
License & Metadata
License
mit
Announcement Date
January 20, 2025
Last Updated
July 19, 2025
Similar Models
All ModelsDeepSeek R1 Distill Llama 70B
DeepSeek
70.6B
Best score:0.7 (GPQA)
Released:Jan 2025
Price:$0.10/1M tokens
DeepSeek R1 Distill Qwen 14B
DeepSeek
14.8B
Best score:0.6 (GPQA)
Released:Jan 2025
DeepSeek-R1-0528
DeepSeek
671.0B
Best score:0.8 (GPQA)
Released:May 2025
Price:$0.70/1M tokens
DeepSeek-V3 0324
DeepSeek
671.0B
Best score:0.7 (GPQA)
Released:Mar 2025
Price:$0.28/1M tokens
Llama-3.3 Nemotron Super 49B v1
NVIDIA
49.9B
Best score:0.7 (GPQA)
Released:Mar 2025
Jamba 1.5 Mini
AI21 Labs
52.0B
Best score:0.9 (ARC)
Released:Aug 2024
Price:$0.20/1M tokens
Mistral Small 3 24B Instruct
Mistral AI
24.0B
Best score:0.8 (HumanEval)
Released:Jan 2025
Price:$0.10/1M tokens
Gemma 2 27B
27.2B
Best score:0.8 (MMLU)
Released:Jun 2024
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.