Mistral Large 2
A 123 billion parameter model with strong capabilities in code generation, math, and reasoning. Features improved multilingual support for dozens of languages, a 128k context window, and advanced function calling capabilities. Excels at instruction following and delivers concise results.
Key Specifications
Parameters
123.0B
Context
128.0K
Release Date
July 24, 2024
Average Score
87.6%
Timeline
Key dates in the model's history
Announcement
July 24, 2024
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
123.0B
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$2.00
Output (per 1M tokens)
$6.00
Max Input Tokens
128.0K
Max Output Tokens
128.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
Accuracy In context evaluation model LLM accuracy relates to to abilities model correctly answer on questions or solve tasks. Accuracy can measure, proportion correct answers from general numbers questions. In capacity accuracy often is used evaluation, verification or comparison with reference answers. Accuracy is important evaluation, since gives representation about that, how well model at answer on questions in specific domain field or in various contexts. She/It helps identify strong side and limitations model, and also determine field, requiring further • Self-reported
Programming
Programming skills tests
HumanEval
Pass@1 Method evaluation Pass@1 measures percentage tasks, which model solves successfully with first attempts. This direct score quality work model at without capabilities corrections or attempts. metric especially important for scenarios, where users and exact answer, or in cases, when attempts from-for limitations by time, or systems. When Pass@1: • Model receives task one times • one solution • Solution is evaluated how correct or • Final score — percentage correct answers from all tasks High score Pass@1 indicates on reliability model and her/its ability find solutions without additional iterations. This metric often is used in analysis various models for determination their base performance • Self-reported
Mathematics
Mathematical problems and computations
GSM8k
Accuracy
AI-generated answers can sound plausible but still be incorrect. To measure how well LLMs can produce factually accurate answers, we can present them with test questions with known answers, and assess the percentage of their responses that are correct.
Benchmarks:
- General Knowledge: MMLU, a comprehensive test covering 57 subjects from STEM to humanities.
- Advanced Knowledge: GPQA (Graduate-level Professional Questions & Answers), which tests expert-level knowledge.
- Mathematical Reasoning: GSM8K for grade school math problems, and MATH for competition-level problems.
This metric helps evaluate if models can reliably generate factual information rather than just producing coherent-sounding text. High accuracy suggests an LLM can be trusted to provide correct information within its training domain. • Self-reported
Other Tests
Specialized benchmarks
MMLU French
Accuracy AI: The model gave the correct answer. Human: Model correct answer • Self-reported
MT-Bench
Score
Evaluation • Self-reported
License & Metadata
License
mistral_research_license
Announcement Date
July 24, 2024
Last Updated
July 19, 2025
Compare Mistral Large 2
All comparisonsSimilar Models
All ModelsCodestral-22B
Mistral AI
22.2B
Best score:0.8 (HumanEval)
Released:May 2024
Price:$0.20/1M tokens
LongCat-Flash-Thinking-2601
Meituan
560.0B
Best score:1.0 (TAU)
Released:Jan 2026
DeepSeek-R1
DeepSeek
671.0B
Best score:0.9 (MMLU)
Released:Jan 2025
Price:$3.00/1M tokens
GLM-5
Zhipu AI
744.0B
Best score:0.9 (TAU)
Released:Feb 2026
GLM-4.7
Zhipu AI
358.0B
Best score:0.9 (TAU)
Released:Dec 2025
Price:$0.60/1M tokens
DeepSeek-V2.5
DeepSeek
236.0B
Best score:0.9 (HumanEval)
Released:May 2024
Price:$2.00/1M tokens
Kimi K2-Thinking-0905
Moonshot AI
1.0T
Best score:0.8 (GPQA)
Released:Sep 2025
Price:$0.60/1M tokens
Nemotron 3 Super (120B A12B)
NVIDIA
120.0B
Best score:0.8 (GPQA)
Released:Mar 2026
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.