Kimi-k1.5
MultimodalKimi 1.5 is a next-generation multimodal language model developed by Moonshot AI. It leverages advanced reinforcement learning (RL) and scalable multimodal reasoning, delivering top-tier performance in mathematics, coding, computer vision, and long-context reasoning tasks.
Key Specifications
Parameters
-
Context
-
Release Date
January 20, 2025
Average Score
81.7%
Timeline
Key dates in the model's history
Announcement
January 20, 2025
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
AI: accuracy • Self-reported
Multimodal
Working with images and visual data
MathVista
Pass@1 AI system performance is often evaluated by measuring the fraction of samples that are solved correctly on the first attempt. This captures whether the model is able to get the right answer right away, but does not allow the model to refine its solution through multiple attempts, which may be a more realistic measure of the model's usefulness for challenging problems that require exploration. To compute Pass@1, we run the model once on each test sample and measure the percentage of samples where the model's response produces the correct answer. Pass@1 is a straightforward benchmark metric that has been widely used in prior work to assess the reasoning and problem-solving abilities of large language models. AI: Pass@1 Pass@1 test Pass@1 — • Self-reported
MMMU
Pass@1 — Pass@1 (100 200) Pass@1 : Pass@1 ≥ 1 - (1 - c/n)^n n — c — HumanEval, 100 40 : Pass@1 ≥ 1 - (1 - 40/100)^100 = 1 - (1 - 0.4)^100 = 1 - 0.6^100 ≈ 1 - 10^(-22) ≈ 1 • Self-reported
Other Tests
Specialized benchmarks
AIME 2024
Pass@1 Pass@1 — : 1. (200) 2. 3. Pass@1 = Σ(i ) / Pass@k, k Pass@1 Pass@1 HumanEval MBPP, Codex Code Llama • Self-reported
C-Eval
AI-powered systems: LLMs can solve complex problems like a human expert would, by reasoning through them step-by-step. But how can we know if the final answer is correct? Problems in domains like math have exact answers, and we can check if the model's answer matches the correct one. Method details: This approach simply compares the model's final answer with the known correct answer. If they match exactly, the model is marked as correct. Advantages: - Simple to implement - Works well for problems with unique answers - Objective assessment with no human judgment required Limitations: - Very sensitive to formatting differences - May penalize valid alternative expressions or notations - Can't assess reasoning quality or alternative approaches - Often misses near-correct answers When to use: Best for problems with clear, unambiguous answers that can be standardized in form, like multiple choice questions or specific numerical answers • Self-reported
CLUEWSC
AI: «1789» "correct" • Self-reported
IFEval
AI: ChatGPT processes these math problems by analyzing the equations and solving them step by step, similar to how a human would. It identifies the mathematical concepts involved (like calculus, algebra, or geometry), applies relevant formulas and theorems, and works through the solution methodically. For example, when faced with an integral or differential equation, ChatGPT breaks down the problem into manageable parts, applies standard techniques like substitution or integration by parts, and carries out the calculations carefully to arrive at the final answer. The model can handle a wide range of mathematical tasks, from basic arithmetic to more complex problems involving multiple variables, though its performance on very advanced mathematics may vary. When solving problems, ChatGPT shows its work by explaining each step of the reasoning process, which helps users understand how it arrived at the solution • Self-reported
LiveCodeBench v5 24.12-25.2
Pass@1 Pass@1 accuracy Pass@1 Pass@k k > 1 Pass@1. Chen et al. (2021), Pass@1, Pass@k, : Pass@1 ≈ Pass@k × (1 / k) k c c/k. Pass@1 • Self-reported
MATH-500
AI: ChatGPT-4o Reference: Exact Match Specific Information: - Test if the AI's answer exactly matches the reference answer. - Example: If reference = "42", AI answer = "42" would be correct, but AI answer = "forty-two" would be incorrect. - Useful for: Factual questions with definitive answers, calculations, dates, names. - Limitations: Strict matching doesn't account for semantically equivalent answers expressed differently. Scoring Protocol: 1 = Answer exactly matches the reference 0 = Answer doesn't exactly match the reference • Self-reported
License & Metadata
License
proprietary
Announcement Date
January 20, 2025
Last Updated
July 19, 2025
Similar Models
All ModelsKimi K2.5
Moonshot AI
MM1.0T
Best score:0.9 (GPQA)
Released:Jan 2026
ERNIE 5.0
Baidu
MM
Best score:0.8 (GPQA)
Released:Jan 2025
Nova Pro
Amazon
MM
Best score:0.9 (ARC)
Released:Nov 2024
Price:$0.80/1M tokens
Gemini 3 Pro
MM
Best score:0.9 (GPQA)
Released:Nov 2025
Price:$2.00/1M tokens
Gemini 3 Flash
MM
Best score:0.9 (GPQA)
Released:Dec 2025
Price:$0.50/1M tokens
Claude Opus 4.5
Anthropic
MM
Best score:0.9 (TAU)
Released:Nov 2025
Price:$5.00/1M tokens
Claude 3.5 Sonnet
Anthropic
MM
Best score:0.9 (HumanEval)
Released:Jun 2024
Price:$3.00/1M tokens
Claude Opus 4.6
Anthropic
MM
Best score:1.0 (TAU)
Released:Feb 2026
Price:$5.00/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.