Codestral-22B

Name: Codestral-22B
Author: Mistral AI

Mistral AI

A 22 billion parameter code generation model trained on over 80 programming languages, including Python, Java, C, C++, JavaScript, and Bash. Supports both instruction execution and fill-in-the-middle (FIM) functionality for code autocomplete and generation tasks.

Key Specifications

Parameters

22.2B

Context

32.8K

Release Date

May 29, 2024

Average Score

65.9%

API Documentation Model Weights Results Blog

Timeline

Key dates in the model's history

Announcement

May 29, 2024

Last Update

July 19, 2025

Today

May 10, 2026

Technical Specifications

Parameters

22.2B

Training Tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$0.20

Output (per 1M tokens)

$0.60

Max Input Tokens

32.8K

Max Output Tokens

32.8K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

Programming

Programming skills tests

HumanEval

pass@1 with first attempts (pass@1) - this metric, which evaluates accuracy, when model has only one attempt for solutions tasks. She/It measures proportion answers, which model solves correctly with first times, without capabilities its solution or several answers. This metric especially important for evaluation basic abilities model in context, when users results with first attempts, and when no capabilities for several iterations or capabilities choose best answer from several generated options • Self-reported

81.1%

MBPP

pass@1 that, that model with first attempts will solve task or correct answer. In difference from metrics accuracy, which determines correctness answer model in form (correctly/incorrectly), pass@1 accounts for answers. For tasks with answer or tasks generation code pass@1 measures that model correct answer with first attempts without necessity repeated attempts or For computation pass@1 model generates several independent answers on one and that indeed question. If k from n generated answers then pass@1 = k/n. This approach allows evaluate not only ability model find correct solution, but and her/its confidence in answer • Self-reported

78.2%

Other Tests

Specialized benchmarks

CruxEval-O

pass@1 Pass@1 In this approach model simply makes one attempt, without any-or tools, capabilities or verification answer. This method evaluation "in one ". Model receives example assignments and gives answer. For some types tasks, such how mathematical equations or puzzles, pass@1 can be efficient from-for LLM to However for other types tasks this method can give results • Self-reported

51.3%

HumanEval-Average

Pass@1 - this metric, proportion tasks, which model solves with first attempts. She/It reflects probability that, that most answer, model, is correct. When Pass@1 model generates one answer on task, and if this answer task is considered This strict metric, so how she/it requires, in order to model was with first attempts, without capabilities corrections or its answer. Pass@1 especially useful for evaluation basic abilities model and accuracy her/its in scenarios, where user on first answer without additional This metric, which well with using in situations, requiring and exact answer. For improvements scores Pass@1 model often more but exact answers, instead • Self-reported

61.5%

HumanEvalFIM-Average

pass@1 In this work we we present metric, "pass@1", which can for evaluation quality answers LLM on tasks programming. Metric how many tasks can be correctly with first attempts. For computation metrics pass@1 we: 1. answer model on task programming 2. this answer on test cases 3. passes whether answer all tests Metric pass@1 shows proportion tasks, which model solved with first attempts. For example, if model correctly solved 75 from 100 tasks with first attempts, then pass@1 = 0.75 or 75%. In difference from other metrics, such how pass@k, which allow model do several attempts and best result, pass@1 evaluates ability model generate correct answer with first times, that to in scenarios • Self-reported

91.6%

RepoBench

pass@1 AI-system tries answer on question. If she/it answers correctly with first attempts, this how (1), in case - how (0). Evaluation pass@1 represents itself proportion questions, on which system answers correctly with first attempts • Self-reported

34.0%

Spider

solution with first attempts AI: Translate following text on Russian language • Self-reported

63.5%

License & Metadata

License

mnpl_0_1

Announcement Date

May 29, 2024

Last Updated

July 19, 2025

Similar Models

All Models

Devstral Small 1.1

Mistral AI

24.0B

Released:Jul 2025

Price:$0.10/1M tokens

Mistral Small

Mistral AI

22.0B

Released:Sep 2024

Price:$0.20/1M tokens

Mistral Small 3 24B Instruct

Mistral AI

24.0B

Best score:0.8 (HumanEval)

Released:Jan 2025

Price:$0.10/1M tokens

Mistral NeMo Instruct

Mistral AI

12.0B

Best score:0.7 (MMLU)

Released:Jul 2024

Price:$0.15/1M tokens

Magistral Small 2506

Mistral AI

24.0B

Best score:0.7 (GPQA)

Released:Jun 2025

Mistral Large 2

Mistral AI

123.0B

Best score:0.9 (HumanEval)

Released:Jul 2024

Price:$2.00/1M tokens

Phi 4

Microsoft

14.7B

Best score:0.8 (MMLU)

Released:Dec 2024

Price:$0.07/1M tokens

GLM-4.7-Flash

Zhipu AI

30.0B

Best score:0.8 (TAU)

Released:Jan 2026

Price:$0.07/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.