DeepSeek R1 Distill Llama 70B

Name: DeepSeek R1 Distill Llama 70B
Author: DeepSeek

DeepSeek

DeepSeek-R1 is a first-generation reasoning model built on DeepSeek-V3 (671 billion total parameters, 37 billion activated per token). It uses large-scale reinforcement learning (RL) to improve chain-of-thought reasoning and logical thinking abilities, delivering high performance in mathematical tasks, coding, and multi-step reasoning.

Key Specifications

Parameters

70.6B

Context

128.0K

Release Date

January 20, 2025

Average Score

76.0%

API Documentation Research Paper Repository Model Weights

Timeline

Key dates in the model's history

Announcement

January 20, 2025

Last Update

July 19, 2025

Today

May 9, 2026

Technical Specifications

Parameters

70.6B

Training Tokens

14.8T tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$0.10

Output (per 1M tokens)

$0.40

Max Input Tokens

128.0K

Max Output Tokens

128.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Reasoning

Logical reasoning and analysis

GPQA

Diamond, Pass@1 Method, Google Research in "Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models" (2024), uses (self-play) for improvement abilities models solve tasks. Method works following manner: 1. Tasks one and that indeed base model (for example, 32 times) 2. 32 attempts correct answers 3. Model its own correct attempts After such model can solve tasks with first attempts with that indeed accuracy, with which model tasks from 32 attempts. Diamond relates to to Pass@1, then is how many tasks model can solve with first attempts. Pass@1 especially important for use, since in world at us usually no solutions tasks and choice best answer. In difference from other methods, Diamond not requires additional data for training, using only its own attempts • Self-reported

65.2%

Other Tests

Specialized benchmarks

AIME 2024

Cons@64 Cons@64 — this method evaluation for benchmarks mathematics and reasoning, which model on 64 various solutions and uses «for » for choice final answer. Method includes in itself: 1) 64 independent solutions. 2) each answer. 3) most answer how final solutions. This method, because that he in large language models, allowing them results, than at Cons@64 efficient because, that for majority tasks LLM usually generates correct answer than specific incorrect answer. For example, if model gives correct answer in 30% cases, and set answers, each with below 30%, then at by 64 correct answer, total, majority. Cons@64 for improvements performance models on mathematical benchmarks, including MATH, GSM8K, AIME and other tests reasoning • Self-reported

86.7%

LiveCodeBench

Pass@1 — this metric evaluation model generation code, probability that, that one attempt solutions will This score for benchmarks programming, such how HumanEval. In difference from metrics Pass@k, which evaluates probability correct solutions among k various generated answers, Pass@1 only one attempt. Pass@1 directly reflects ability model generate code with first times, without use methods sample. High score Pass@1 especially important in scenarios, where necessary obtain solution without set This also score for evaluation reliability model in real tasks programming. When Pass@1 each task is considered how result: solution or passes all tests, or no. score is calculated how proportion successfully solved tasks from their general number • Self-reported

57.5%

MATH-500

Pass@1 Metric Pass@1 measures probability obtaining correct answer on task with first attempts. She/It shows, how well is model, when at her is only one solve problem. When Pass@1 model generates solution for tasks, and this solution or correct (1), or (0). evaluation Pass@1 represents itself proportion correct answers from all test tasks. In context evaluation mathematical abilities models Pass@1 is important score reliability. evaluation Pass@1 means, that model sequentially gives correct answers without necessity several attempts or from several solutions • Self-reported

94.5%

License & Metadata

License

mit

Announcement Date

January 20, 2025

Last Updated

July 19, 2025

Similar Models

All Models

DeepSeek R1 Distill Qwen 14B

DeepSeek

14.8B

Best score:0.6 (GPQA)

Released:Jan 2025

DeepSeek R1 Distill Qwen 32B

DeepSeek

32.8B

Best score:0.6 (GPQA)

Released:Jan 2025

Price:$0.12/1M tokens

DeepSeek-R1-0528

DeepSeek

671.0B

Best score:0.8 (GPQA)

Released:May 2025

Price:$0.70/1M tokens

DeepSeek-V3 0324

DeepSeek

671.0B

Best score:0.7 (GPQA)

Released:Mar 2025

Price:$0.28/1M tokens

Llama-3.3 Nemotron Super 49B v1

NVIDIA

49.9B

Best score:0.7 (GPQA)

Released:Mar 2025

Jamba 1.5 Mini

AI21 Labs

52.0B

Best score:0.9 (ARC)

Released:Aug 2024

Price:$0.20/1M tokens

Mistral Small 3 24B Instruct

Mistral AI

24.0B

Best score:0.8 (HumanEval)

Released:Jan 2025

Price:$0.10/1M tokens

Gemma 2 27B

Google

27.2B

Best score:0.8 (MMLU)

Released:Jun 2024

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.