DeepSeek logo

DeepSeek R1 Distill Llama 70B

DeepSeek

DeepSeek-R1 is a first-generation reasoning model built on DeepSeek-V3 (671 billion total parameters, 37 billion activated per token). It uses large-scale reinforcement learning (RL) to improve chain-of-thought reasoning and logical thinking abilities, delivering high performance in mathematical tasks, coding, and multi-step reasoning.

Key Specifications

Parameters
70.6B
Context
128.0K
Release Date
January 20, 2025
Average Score
76.0%

Timeline

Key dates in the model's history
Announcement
January 20, 2025
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
70.6B
Training Tokens
14.8T tokens
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$0.10
Output (per 1M tokens)
$0.40
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Reasoning

Logical reasoning and analysis
GPQA
Diamond, Pass@1 Method, Google Research in "Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models" (2024), uses (self-play) for improvement abilities models solve tasks. Method works following manner: 1. Tasks one and that indeed base model (for example, 32 times) 2. 32 attempts correct answers 3. Model its own correct attempts After such model can solve tasks with first attempts with that indeed accuracy, with which model tasks from 32 attempts. Diamond relates to to Pass@1, then is how many tasks model can solve with first attempts. Pass@1 especially important for use, since in world at us usually no solutions tasks and choice best answer. In difference from other methods, Diamond not requires additional data for training, using only its own attemptsSelf-reported
65.2%

Other Tests

Specialized benchmarks
AIME 2024
Cons@64 Cons@64 — this method evaluation for benchmarks mathematics and reasoning, which model on 64 various solutions and uses «for » for choice final answer. Method includes in itself: 1) 64 independent solutions. 2) each answer. 3) most answer how final solutions. This method, because that he in large language models, allowing them results, than at Cons@64 efficient because, that for majority tasks LLM usually generates correct answer than specific incorrect answer. For example, if model gives correct answer in 30% cases, and set answers, each with below 30%, then at by 64 correct answer, total, majority. Cons@64 for improvements performance models on mathematical benchmarks, including MATH, GSM8K, AIME and other tests reasoningSelf-reported
86.7%
LiveCodeBench
Pass@1 — this metric evaluation model generation code, probability that, that one attempt solutions will This score for benchmarks programming, such how HumanEval. In difference from metrics Pass@k, which evaluates probability correct solutions among k various generated answers, Pass@1 only one attempt. Pass@1 directly reflects ability model generate code with first times, without use methods sample. High score Pass@1 especially important in scenarios, where necessary obtain solution without set This also score for evaluation reliability model in real tasks programming. When Pass@1 each task is considered how result: solution or passes all tests, or no. score is calculated how proportion successfully solved tasks from their general numberSelf-reported
57.5%
MATH-500
Pass@1 Metric Pass@1 measures probability obtaining correct answer on task with first attempts. She/It shows, how well is model, when at her is only one solve problem. When Pass@1 model generates solution for tasks, and this solution or correct (1), or (0). evaluation Pass@1 represents itself proportion correct answers from all test tasks. In context evaluation mathematical abilities models Pass@1 is important score reliability. evaluation Pass@1 means, that model sequentially gives correct answers without necessity several attempts or from several solutionsSelf-reported
94.5%

License & Metadata

License
mit
Announcement Date
January 20, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.