Key Specifications
Parameters
22.2B
Context
32.8K
Release Date
May 29, 2024
Average Score
65.9%
Timeline
Key dates in the model's history
Announcement
May 29, 2024
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
22.2B
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$0.20
Output (per 1M tokens)
$0.60
Max Input Tokens
32.8K
Max Output Tokens
32.8K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
Programming
Programming skills tests
HumanEval
pass@1 with first attempts (pass@1) - this metric, which evaluates accuracy, when model has only one attempt for solutions tasks. She/It measures proportion answers, which model solves correctly with first times, without capabilities its solution or several answers. This metric especially important for evaluation basic abilities model in context, when users results with first attempts, and when no capabilities for several iterations or capabilities choose best answer from several generated options • Self-reported
MBPP
pass@1 that, that model with first attempts will solve task or correct answer. In difference from metrics accuracy, which determines correctness answer model in form (correctly/incorrectly), pass@1 accounts for answers. For tasks with answer or tasks generation code pass@1 measures that model correct answer with first attempts without necessity repeated attempts or For computation pass@1 model generates several independent answers on one and that indeed question. If k from n generated answers then pass@1 = k/n. This approach allows evaluate not only ability model find correct solution, but and her/its confidence in answer • Self-reported
Other Tests
Specialized benchmarks
CruxEval-O
pass@1 Pass@1 In this approach model simply makes one attempt, without any-or tools, capabilities or verification answer. This method evaluation "in one ". Model receives example assignments and gives answer. For some types tasks, such how mathematical equations or puzzles, pass@1 can be efficient from-for LLM to However for other types tasks this method can give results • Self-reported
HumanEval-Average
Pass@1 - this metric, proportion tasks, which model solves with first attempts. She/It reflects probability that, that most answer, model, is correct. When Pass@1 model generates one answer on task, and if this answer task is considered This strict metric, so how she/it requires, in order to model was with first attempts, without capabilities corrections or its answer. Pass@1 especially useful for evaluation basic abilities model and accuracy her/its in scenarios, where user on first answer without additional This metric, which well with using in situations, requiring and exact answer. For improvements scores Pass@1 model often more but exact answers, instead • Self-reported
HumanEvalFIM-Average
pass@1 In this work we we present metric, "pass@1", which can for evaluation quality answers LLM on tasks programming. Metric how many tasks can be correctly with first attempts. For computation metrics pass@1 we: 1. answer model on task programming 2. this answer on test cases 3. passes whether answer all tests Metric pass@1 shows proportion tasks, which model solved with first attempts. For example, if model correctly solved 75 from 100 tasks with first attempts, then pass@1 = 0.75 or 75%. In difference from other metrics, such how pass@k, which allow model do several attempts and best result, pass@1 evaluates ability model generate correct answer with first times, that to in scenarios • Self-reported
RepoBench
pass@1 AI-system tries answer on question. If she/it answers correctly with first attempts, this how (1), in case - how (0). Evaluation pass@1 represents itself proportion questions, on which system answers correctly with first attempts • Self-reported
Spider
solution with first attempts AI: Translate following text on Russian language • Self-reported
License & Metadata
License
mnpl_0_1
Announcement Date
May 29, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsDevstral Small 1.1
Mistral AI
24.0B
Released:Jul 2025
Price:$0.10/1M tokens
Mistral Small
Mistral AI
22.0B
Released:Sep 2024
Price:$0.20/1M tokens
Mistral Small 3 24B Instruct
Mistral AI
24.0B
Best score:0.8 (HumanEval)
Released:Jan 2025
Price:$0.10/1M tokens
Mistral NeMo Instruct
Mistral AI
12.0B
Best score:0.7 (MMLU)
Released:Jul 2024
Price:$0.15/1M tokens
Magistral Small 2506
Mistral AI
24.0B
Best score:0.7 (GPQA)
Released:Jun 2025
Mistral Large 2
Mistral AI
123.0B
Best score:0.9 (HumanEval)
Released:Jul 2024
Price:$2.00/1M tokens
Phi 4
Microsoft
14.7B
Best score:0.8 (MMLU)
Released:Dec 2024
Price:$0.07/1M tokens
GLM-4.7-Flash
Zhipu AI
30.0B
Best score:0.8 (TAU)
Released:Jan 2026
Price:$0.07/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.