DeepSeek R1 Distill Llama 70B
DeepSeek-R1 is a first-generation reasoning model built on DeepSeek-V3 (671 billion total parameters, 37 billion activated per token). It uses large-scale reinforcement learning (RL) to improve chain-of-thought reasoning and logical thinking abilities, delivering high performance in mathematical tasks, coding, and multi-step reasoning.
Key Specifications
Parameters
70.6B
Context
128.0K
Release Date
January 20, 2025
Average Score
76.0%
Timeline
Key dates in the model's history
Announcement
January 20, 2025
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
70.6B
Training Tokens
14.8T tokens
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$0.10
Output (per 1M tokens)
$0.40
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
Reasoning
Logical reasoning and analysis
GPQA
Diamond, Pass@1 Method, Google Research in "Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models" (2024), uses (self-play) for improvement abilities models solve tasks. Method works following manner: 1. Tasks one and that indeed base model (for example, 32 times) 2. 32 attempts correct answers 3. Model its own correct attempts After such model can solve tasks with first attempts with that indeed accuracy, with which model tasks from 32 attempts. Diamond relates to to Pass@1, then is how many tasks model can solve with first attempts. Pass@1 especially important for use, since in world at us usually no solutions tasks and choice best answer. In difference from other methods, Diamond not requires additional data for training, using only its own attempts • Self-reported
Other Tests
Specialized benchmarks
AIME 2024
Cons@64 Cons@64 — this method evaluation for benchmarks mathematics and reasoning, which model on 64 various solutions and uses «for » for choice final answer. Method includes in itself: 1) 64 independent solutions. 2) each answer. 3) most answer how final solutions. This method, because that he in large language models, allowing them results, than at Cons@64 efficient because, that for majority tasks LLM usually generates correct answer than specific incorrect answer. For example, if model gives correct answer in 30% cases, and set answers, each with below 30%, then at by 64 correct answer, total, majority. Cons@64 for improvements performance models on mathematical benchmarks, including MATH, GSM8K, AIME and other tests reasoning • Self-reported
LiveCodeBench
Pass@1 — this metric evaluation model generation code, probability that, that one attempt solutions will This score for benchmarks programming, such how HumanEval. In difference from metrics Pass@k, which evaluates probability correct solutions among k various generated answers, Pass@1 only one attempt. Pass@1 directly reflects ability model generate code with first times, without use methods sample. High score Pass@1 especially important in scenarios, where necessary obtain solution without set This also score for evaluation reliability model in real tasks programming. When Pass@1 each task is considered how result: solution or passes all tests, or no. score is calculated how proportion successfully solved tasks from their general number • Self-reported
MATH-500
Pass@1 Metric Pass@1 measures probability obtaining correct answer on task with first attempts. She/It shows, how well is model, when at her is only one solve problem. When Pass@1 model generates solution for tasks, and this solution or correct (1), or (0). evaluation Pass@1 represents itself proportion correct answers from all test tasks. In context evaluation mathematical abilities models Pass@1 is important score reliability. evaluation Pass@1 means, that model sequentially gives correct answers without necessity several attempts or from several solutions • Self-reported
License & Metadata
License
mit
Announcement Date
January 20, 2025
Last Updated
July 19, 2025
Similar Models
All ModelsDeepSeek R1 Distill Qwen 14B
DeepSeek
14.8B
Best score:0.6 (GPQA)
Released:Jan 2025
DeepSeek R1 Distill Qwen 32B
DeepSeek
32.8B
Best score:0.6 (GPQA)
Released:Jan 2025
Price:$0.12/1M tokens
DeepSeek-R1-0528
DeepSeek
671.0B
Best score:0.8 (GPQA)
Released:May 2025
Price:$0.70/1M tokens
DeepSeek-V3 0324
DeepSeek
671.0B
Best score:0.7 (GPQA)
Released:Mar 2025
Price:$0.28/1M tokens
Llama-3.3 Nemotron Super 49B v1
NVIDIA
49.9B
Best score:0.7 (GPQA)
Released:Mar 2025
Jamba 1.5 Mini
AI21 Labs
52.0B
Best score:0.9 (ARC)
Released:Aug 2024
Price:$0.20/1M tokens
Mistral Small 3 24B Instruct
Mistral AI
24.0B
Best score:0.8 (HumanEval)
Released:Jan 2025
Price:$0.10/1M tokens
Gemma 2 27B
27.2B
Best score:0.8 (MMLU)
Released:Jun 2024
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.