Mistral AI logo

Mistral Large 2

Mistral AI

A 123 billion parameter model with strong capabilities in code generation, math, and reasoning. Features improved multilingual support for dozens of languages, a 128k context window, and advanced function calling capabilities. Excels at instruction following and delivers concise results.

Key Specifications

Parameters
123.0B
Context
128.0K
Release Date
July 24, 2024
Average Score
87.6%

Timeline

Key dates in the model's history
Announcement
July 24, 2024
Last Update
July 19, 2025
Today
March 26, 2026

Technical Specifications

Parameters
123.0B
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$2.00
Output (per 1M tokens)
$6.00
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
MMLU
Accuracy In context evaluation model LLM accuracy relates to to abilities model correctly answer on questions or solve tasks. Accuracy can measure, proportion correct answers from general numbers questions. In capacity accuracy often is used evaluation, verification or comparison with reference answers. Accuracy is important evaluation, since gives representation about that, how well model at answer on questions in specific domain field or in various contexts. She/It helps identify strong side and limitations model, and also determine field, requiring furtherSelf-reported
84.0%

Programming

Programming skills tests
HumanEval
Pass@1 Method evaluation Pass@1 measures percentage tasks, which model solves successfully with first attempts. This direct score quality work model at without capabilities corrections or attempts. metric especially important for scenarios, where users and exact answer, or in cases, when attempts from-for limitations by time, or systems. When Pass@1: • Model receives task one times • one solution • Solution is evaluated how correct or • Final score — percentage correct answers from all tasks High score Pass@1 indicates on reliability model and her/its ability find solutions without additional iterations. This metric often is used in analysis various models for determination their base performanceSelf-reported
92.0%

Mathematics

Mathematical problems and computations
GSM8k
Accuracy AI-generated answers can sound plausible but still be incorrect. To measure how well LLMs can produce factually accurate answers, we can present them with test questions with known answers, and assess the percentage of their responses that are correct. Benchmarks: - General Knowledge: MMLU, a comprehensive test covering 57 subjects from STEM to humanities. - Advanced Knowledge: GPQA (Graduate-level Professional Questions & Answers), which tests expert-level knowledge. - Mathematical Reasoning: GSM8K for grade school math problems, and MATH for competition-level problems. This metric helps evaluate if models can reliably generate factual information rather than just producing coherent-sounding text. High accuracy suggests an LLM can be trusted to provide correct information within its training domain.Self-reported
93.0%

Other Tests

Specialized benchmarks
MMLU French
Accuracy AI: The model gave the correct answer. Human: Model correct answerSelf-reported
82.8%
MT-Bench
Score EvaluationSelf-reported
86.3%

License & Metadata

License
mistral_research_license
Announcement Date
July 24, 2024
Last Updated
July 19, 2025

Compare Mistral Large 2

All comparisons

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.