Mistral Large 2

Name: Mistral Large 2
Author: Mistral AI

Mistral AI

A 123 billion parameter model with strong capabilities in code generation, math, and reasoning. Features improved multilingual support for dozens of languages, a 128k context window, and advanced function calling capabilities. Excels at instruction following and delivers concise results.

Key Specifications

Parameters

123.0B

Context

128.0K

Release Date

July 24, 2024

Average Score

87.6%

API Documentation Model Weights Results Blog

Timeline

Key dates in the model's history

Announcement

July 24, 2024

Last Update

July 19, 2025

Today

May 10, 2026

Technical Specifications

Parameters

123.0B

Training Tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$2.00

Output (per 1M tokens)

$6.00

Max Input Tokens

128.0K

Max Output Tokens

128.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

MMLU

Accuracy In context evaluation model LLM accuracy relates to to abilities model correctly answer on questions or solve tasks. Accuracy can measure, proportion correct answers from general numbers questions. In capacity accuracy often is used evaluation, verification or comparison with reference answers. Accuracy is important evaluation, since gives representation about that, how well model at answer on questions in specific domain field or in various contexts. She/It helps identify strong side and limitations model, and also determine field, requiring further • Self-reported

84.0%

Programming

Programming skills tests

HumanEval

Pass@1 Method evaluation Pass@1 measures percentage tasks, which model solves successfully with first attempts. This direct score quality work model at without capabilities corrections or attempts. metric especially important for scenarios, where users and exact answer, or in cases, when attempts from-for limitations by time, or systems. When Pass@1: • Model receives task one times • one solution • Solution is evaluated how correct or • Final score — percentage correct answers from all tasks High score Pass@1 indicates on reliability model and her/its ability find solutions without additional iterations. This metric often is used in analysis various models for determination their base performance • Self-reported

92.0%

Mathematics

Mathematical problems and computations

GSM8k

Accuracy AI-generated answers can sound plausible but still be incorrect. To measure how well LLMs can produce factually accurate answers, we can present them with test questions with known answers, and assess the percentage of their responses that are correct. Benchmarks: - General Knowledge: MMLU, a comprehensive test covering 57 subjects from STEM to humanities. - Advanced Knowledge: GPQA (Graduate-level Professional Questions & Answers), which tests expert-level knowledge. - Mathematical Reasoning: GSM8K for grade school math problems, and MATH for competition-level problems. This metric helps evaluate if models can reliably generate factual information rather than just producing coherent-sounding text. High accuracy suggests an LLM can be trusted to provide correct information within its training domain. • Self-reported

93.0%

Other Tests

Specialized benchmarks

MMLU French

Accuracy AI: The model gave the correct answer. Human: Model correct answer • Self-reported

82.8%

MT-Bench

Score Evaluation • Self-reported

86.3%

License & Metadata

License

mistral_research_license

Announcement Date

July 24, 2024

Last Updated

July 19, 2025

Compare Mistral Large 2

All comparisons

vs GPT-5.1 Instant vs Grok-3 Mini vs Claude Opus 4.5 vs MiniMax M2.1 vs Qwen3-Next-80B-A3B-Instruct vs GPT-5 Medium vs Nemotron 3 Super (120B A12B)vs Qwen3.5 9B

Similar Models

All Models

Codestral-22B

Mistral AI

22.2B

Best score:0.8 (HumanEval)

Released:May 2024

Price:$0.20/1M tokens

LongCat-Flash-Thinking-2601

Meituan

560.0B

Best score:1.0 (TAU)

Released:Jan 2026

DeepSeek-R1

DeepSeek

671.0B

Best score:0.9 (MMLU)

Released:Jan 2025

Price:$3.00/1M tokens

GLM-5

Zhipu AI

744.0B

Best score:0.9 (TAU)

Released:Feb 2026

GLM-4.7

Zhipu AI

358.0B

Best score:0.9 (TAU)

Released:Dec 2025

Price:$0.60/1M tokens

DeepSeek-V2.5

DeepSeek

236.0B

Best score:0.9 (HumanEval)

Released:May 2024

Price:$2.00/1M tokens

Kimi K2-Thinking-0905

Moonshot AI

1.0T

Best score:0.8 (GPQA)

Released:Sep 2025

Price:$0.60/1M tokens

Nemotron 3 Super (120B A12B)

NVIDIA

120.0B

Best score:0.8 (GPQA)

Released:Mar 2026

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.