Moonshot AI logo

Kimi K2 Instruct

Moonshot AI

Kimi K2 Instruct is the instruction-tuned version of Kimi K2, a Mixture-of-Experts (MoE) language model by Moonshot AI with 1 trillion parameters and 32 billion active per forward pass. Optimized for following instructions, multi-turn conversation, and agentic use cases. Supports context windows up to 128K tokens and excels in coding, reasoning, and tool use tasks.

Key Specifications

Parameters
1.0T
Context
131.1K
Release Date
January 1, 2025
Average Score
66.7%

Timeline

Key dates in the model's history
Announcement
January 1, 2025
Last Update
July 19, 2025
Today
March 26, 2026

Technical Specifications

Parameters
1.0T
Training Tokens
15.5T tokens
Knowledge Cutoff
-
Family
-
Fine-tuned from
kimi-k2-base
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$0.57
Output (per 1M tokens)
$2.30
Max Input Tokens
131.1K
Max Output Tokens
131.1K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
MMLU
Standard evaluationSelf-reported
89.5%

Programming

Programming skills tests
HumanEval
Pass@1 Pass@1 Pass@k, k Pass@1 Pass@1 Pass@1 Pass@1Self-reported
93.3%

Mathematics

Mathematical problems and computations
GSM8k
Accuracy AI: I'll compute the accuracy of the model's responses by comparing them to the ground truth answers. For multiple-choice questions, I'll check if the model selected the correct option (A, B, C, or D). For open-ended questions that require numerical answers, I'll check if the model's final answer matches the correct value. I'll be lenient with formatting differences (e.g., "5" vs "5.0" vs "five"). For open-ended questions that require textual answers, I'll assess whether the model's response contains the key elements of the correct answer. I'll report the overall accuracy as the percentage of questions answered correctly, and break down performance by question type and difficulty level.Self-reported
97.3%

Reasoning

Logical reasoning and analysis
GPQA
Diamond Avg@8 Diamond Avg@8 — 8 Diamond Avg@8 : 1. 2. 3. 8 4. 8 Diamond Avg@8Self-reported
75.1%

Other Tests

Specialized benchmarks
AceBench
Accuracy AI: 0Self-reported
76.5%
Aider-Polyglot
Accuracy AI: 0.0/1.0Self-reported
60.0%
AIME 2024
64Self-reported
69.6%
AIME 2025
Avg@64Self-reported
49.5%
AutoLogi
Accuracy AISelf-reported
89.5%
CBNSL
Accuracy AI: "Accuracy" refers to how often a model makes correct predictions or provides correct answers. For simple tasks like "Is this image a cat or a dog?", accuracy is straightforward - the percentage of correct classifications. For complex tasks like answering multi-step math problems or open-ended questions, accuracy becomes more nuanced: 1. Partial correctness may apply (getting part of a multi-step solution right) 2. Multiple valid answers may exist 3. Context and interpretation matter When evaluating large language models, accuracy can be measured through: - Benchmark performance (scores on standardized tests) - Human evaluation (experts judging correctness) - Comparison to reference answers - Self-consistency (agreement across multiple attempts) Improving accuracy typically involves: - More/better training data - Enhanced model architectures - Better fine-tuning techniques - Improved prompting methods High accuracy is critical for high-stakes applications but must be balanced with other considerations like speed, transparency, and resource efficiency.Self-reported
95.6%
CNMO 2024
16Self-reported
74.3%
CSimpleQA
CorrectSelf-reported
78.4%
HMMT 2025
Avg@32 AI: *Self-reported
38.8%
HumanEval-ER
Pass@1 Pass@1 n=k (20 ), evaluation Pass@1, Pass@k kSelf-reported
81.1%
Humanity's Last Exam
Accuracy ()Self-reported
4.7%
IFEval
"", LLM : - "X" - "" - ""Self-reported
89.8%
LiveBench
Pass@1 — Pass@1 Pass@1 (n=100)Self-reported
76.4%
LiveCodeBench v6
Pass@1 Pass@1 Pass@1 : 1. 2. 3. () 4. : Pass@1 = () / () Pass@1 Pass@k (), Pass@1Self-reported
53.7%
MATH-500
Accuracy AI Accuracy : accuracy - GPT-4o, accuracy, accuracy accuracy GPT-4oSelf-reported
97.4%
MMLU-Pro
EMSelf-reported
81.1%
MMLU-Redux
EMSelf-reported
92.7%
MultiChallenge
Accuracy AI: [model] is a powerful artificial intelligence language model developed by OpenAI. In this test, we assess its accuracy in answering questions correctly. Accuracy refers to the model's ability to provide factually correct responses without making errors or generating false information. To evaluate accuracy, we present the model with questions that have verifiable answers across different domains including science, history, mathematics, and general knowledge. We then compare the model's answers against established facts from reliable sources. Factors affecting accuracy include: 1. Knowledge cutoff limitations 2. Training data quality and comprehensiveness 3. Inherent limitations in pattern recognition 4. Prompt specificity and clarity The accuracy assessment gives us insight into how reliable [model] is as an information source and helps identify areas where additional training or improvements may be needed.Self-reported
54.1%
MultiPL-E
# Pass@1 Pass@1 - : 1. 2. () 3. Pass@1 - Pass@1Self-reported
85.7%
MuSR
Pass@1 AI: Pass@1 Pass@1, 1000 (GSM8K)Self-reported
76.4%
OJBench
Pass@1 Pass@1 Pass@1 Pass@1 Pass@k (k > 1)Self-reported
27.1%
PolyMath-en
4Self-reported
65.1%
SimpleQA
Standard evaluationSelf-reported
31.0%
SuperGPQA
Accuracy AI:Self-reported
57.2%
SWE-bench Multilingual
Standard evaluationSelf-reported
47.3%
SWE-bench Verified (Agentic Coding)
Standard evaluationSelf-reported
65.8%
SWE-bench Verified (Agentless)
thinking, : - : : : - : :Self-reported
51.8%
SWE-bench Verified (Multiple Attempts)
temperature top_p, seed. : 1. : 2. Score : 3. : 4. evaluation:Self-reported
71.6%
Tau2 airline
Avg@4 Avg@4 4 4Self-reported
56.5%
Tau2 retail
4Self-reported
70.6%
Tau2 telecom
Avg@4 4 4Self-reported
65.8%
Terminal-bench
AI: evaluation LLM • • — • evaluation — evaluation • • • : • • • • accuracySelf-reported
30.0%
Terminus
Accuracy AI: The ability to correctly predict outputs compared to ground truth.Self-reported
25.0%
ZebraLogic
Accuracy ChatGPT AI: I'm going to solve this step-by-step. To find the smallest positive integer k such that a^k ≡ 1 (mod n), I need to determine the order of a modulo n. Given: - n = 15 - a = 4 First, I'll check if a is relatively prime to n by computing gcd(a,n) = gcd(4,15). 15 = 4*3 + 3 4 = 3*1 + 1 3 = 1*3 + 0 So gcd(4,15) = 1, which means a and n are relatively prime. Now I'll compute powers of a modulo n: 4^1 ≡ 4 (mod 15) 4^2 ≡ 16 ≡ 1 (mod 15) So 4^2 ≡ 1 (mod 15), which means the smallest positive integer k such that a^k ≡ 1 (mod n) is k = 2. Therefore, k = 2 is the answer.Self-reported
89.0%

License & Metadata

License
modified_mit_license
Announcement Date
January 1, 2025
Last Updated
July 19, 2025

Articles about Kimi K2 Instruct

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.