Kimi-k1.5

Name: Kimi-k1.5
Author: Moonshot AI

Multimodal

Moonshot AI

Kimi 1.5 is a next-generation multimodal language model developed by Moonshot AI. It leverages advanced reinforcement learning (RL) and scalable multimodal reasoning, delivering top-tier performance in mathematics, coding, computer vision, and long-context reasoning tasks.

Key Specifications

Parameters

Context

Release Date

January 20, 2025

Average Score

81.7%

API Documentation Research Paper Repository

Timeline

Key dates in the model's history

Announcement

January 20, 2025

Last Update

July 19, 2025

Today

March 25, 2026

Technical Specifications

Parameters

Training Tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

MMLU

AI: accuracy • Self-reported

87.4%

Multimodal

Working with images and visual data

MathVista

Pass@1 AI system performance is often evaluated by measuring the fraction of samples that are solved correctly on the first attempt. This captures whether the model is able to get the right answer right away, but does not allow the model to refine its solution through multiple attempts, which may be a more realistic measure of the model's usefulness for challenging problems that require exploration. To compute Pass@1, we run the model once on each test sample and measure the percentage of samples where the model's response produces the correct answer. Pass@1 is a straightforward benchmark metric that has been widely used in prior work to assess the reasoning and problem-solving abilities of large language models. AI: Pass@1 Pass@1 test Pass@1 — • Self-reported

74.9%

MMMU

Pass@1 — Pass@1 (100 200) Pass@1 : Pass@1 ≥ 1 - (1 - c/n)^n n — c — HumanEval, 100 40 : Pass@1 ≥ 1 - (1 - 40/100)^100 = 1 - (1 - 0.4)^100 = 1 - 0.6^100 ≈ 1 - 10^(-22) ≈ 1 • Self-reported

70.0%

Other Tests

Specialized benchmarks

AIME 2024

Pass@1 Pass@1 — : 1. (200) 2. 3. Pass@1 = Σ(i ) / Pass@k, k Pass@1 Pass@1 HumanEval MBPP, Codex Code Llama • Self-reported

77.5%

C-Eval

AI-powered systems: LLMs can solve complex problems like a human expert would, by reasoning through them step-by-step. But how can we know if the final answer is correct? Problems in domains like math have exact answers, and we can check if the model's answer matches the correct one. Method details: This approach simply compares the model's final answer with the known correct answer. If they match exactly, the model is marked as correct. Advantages: - Simple to implement - Works well for problems with unique answers - Objective assessment with no human judgment required Limitations: - Very sensitive to formatting differences - May penalize valid alternative expressions or notations - Can't assess reasoning quality or alternative approaches - Often misses near-correct answers When to use: Best for problems with clear, unambiguous answers that can be standardized in form, like multiple choice questions or specific numerical answers • Self-reported

88.3%

CLUEWSC

AI: «1789» "correct" • Self-reported

91.4%

IFEval

AI: ChatGPT processes these math problems by analyzing the equations and solving them step by step, similar to how a human would. It identifies the mathematical concepts involved (like calculus, algebra, or geometry), applies relevant formulas and theorems, and works through the solution methodically. For example, when faced with an integral or differential equation, ChatGPT breaks down the problem into manageable parts, applies standard techniques like substitution or integration by parts, and carries out the calculations carefully to arrive at the final answer. The model can handle a wide range of mathematical tasks, from basic arithmetic to more complex problems involving multiple variables, though its performance on very advanced mathematics may vary. When solving problems, ChatGPT shows its work by explaining each step of the reasoning process, which helps users understand how it arrived at the solution • Self-reported

87.2%

LiveCodeBench v5 24.12-25.2

Pass@1 Pass@1 accuracy Pass@1 Pass@k k > 1 Pass@1. Chen et al. (2021), Pass@1, Pass@k, : Pass@1 ≈ Pass@k × (1 / k) k c c/k. Pass@1 • Self-reported

62.5%

MATH-500

AI: ChatGPT-4o Reference: Exact Match Specific Information: - Test if the AI's answer exactly matches the reference answer. - Example: If reference = "42", AI answer = "42" would be correct, but AI answer = "forty-two" would be incorrect. - Useful for: Factual questions with definitive answers, calculations, dates, names. - Limitations: Strict matching doesn't account for semantically equivalent answers expressed differently. Scoring Protocol: 1 = Answer exactly matches the reference 0 = Answer doesn't exactly match the reference • Self-reported

96.2%

License & Metadata

License

proprietary

Announcement Date

January 20, 2025

Last Updated

July 19, 2025

Similar Models

All Models

Kimi K2.5

Moonshot AI

MM1.0T

Best score:0.9 (GPQA)

Released:Jan 2026

ERNIE 5.0

Baidu

Best score:0.8 (GPQA)

Released:Jan 2025

Nova Pro

Amazon

Best score:0.9 (ARC)

Released:Nov 2024

Price:$0.80/1M tokens

Gemini 3 Pro

Google

Best score:0.9 (GPQA)

Released:Nov 2025

Price:$2.00/1M tokens

Gemini 3 Flash

Google

Best score:0.9 (GPQA)

Released:Dec 2025

Price:$0.50/1M tokens

Claude Opus 4.5

Anthropic

Best score:0.9 (TAU)

Released:Nov 2025

Price:$5.00/1M tokens

Claude 3.5 Sonnet

Anthropic

Best score:0.9 (HumanEval)

Released:Jun 2024

Price:$3.00/1M tokens

Claude Opus 4.6

Anthropic

Best score:1.0 (TAU)

Released:Feb 2026

Price:$5.00/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.