Key Specifications
Parameters
-
Context
128.0K
Release Date
April 9, 2024
Average Score
78.1%
Timeline
Key dates in the model's history
Announcement
April 9, 2024
Last Update
July 19, 2025
Today
March 26, 2026
Technical Specifications
Parameters
-
Training Tokens
-
Knowledge Cutoff
December 31, 2023
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$10.00
Output (per 1M tokens)
$30.00
Max Input Tokens
128.0K
Max Output Tokens
4.1K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
Questions with multiple choice by 57 subjects (and ) • Self-reported
Programming
Programming skills tests
HumanEval
Python-tasks programming AI: (In this section, I'll be evaluating the model's ability to write Python code, debug problems, and explain programming concepts.) For these tasks, I'll assess how the model handles: - Writing functions with specific requirements - Debugging existing code - Explaining algorithms and data structures - Optimizing code for performance - Following Python best practices Tasks will range from simple functions to more complex algorithms and will test both practical coding skills and conceptual understanding. AI Assistant: (In this I I will evaluate ability model code on Python, problems and concepts programming.) In these tasks I how model handles with: - functions with code - algorithms and data - code for improvement performance - Python Tasks will from simple functions to more complex algorithms and will how coding, so and understanding • Self-reported
Mathematics
Mathematical problems and computations
MATH
Solution mathematical tasks
AI: Solution mathematical tasks • Self-reported
MGSM
Tasks by mathematics for initial school AI: We and we compare, how well well various model solve simple mathematical tasks-including and base which usually presented in with 3 by 8 Metric: Accuracy solutions for 80 examples with in order to that answers correctly, and not only with answer. Example tasks: "At was 5 Then she/it with 7 After this she/it 3 at her ?" Process solutions: - number at : 5 - still 7 : 5 + 7 = 12 - Then 3 : 12 - 3 = 9 - Answer: 9 • Self-reported
Reasoning
Logical reasoning and analysis
DROP
Understanding and arithmetic (f1 score) • Self-reported
GPQA
Answers on questions general AI: Human: We evaluate all of our models on two challenging question-answering benchmarks: Measuring Massive Multitask Language Understanding (MMLU) (Hendrycks et al., 2021) and General-Purpose Question Answering (GPQA) (Rein et al., 2023). MMLU is a well-established benchmark assessing performance across 57 different knowledge domains, using multiple-choice questions. GPQA, which is much more challenging than MMLU, evaluates models on a set of 448 manually crafted questions with open-ended answers in STEM and humanities domains, including questions that require novel reasoning rather than recall of known facts. In all of these experiments, models generate answers using greedy decoding (beam size = 1) • Self-reported
License & Metadata
License
proprietary
Announcement Date
April 9, 2024
Last Updated
July 19, 2025
Similar Models
All Modelso1-mini
OpenAI
Best score:0.9 (HumanEval)
Released:Sep 2024
Price:$3.00/1M tokens
o1
OpenAI
Best score:0.9 (MMLU)
Released:Dec 2024
Price:$15.00/1M tokens
o1-preview
OpenAI
Best score:0.9 (MMLU)
Released:Sep 2024
Price:$15.00/1M tokens
o3-mini
OpenAI
Best score:0.9 (MMLU)
Released:Jan 2025
Price:$1.10/1M tokens
GPT-3.5 Turbo
OpenAI
Best score:0.7 (MMLU)
Released:Mar 2023
Price:$0.50/1M tokens
GPT-5 Codex
OpenAI
Released:Sep 2025
Price:$2.00/1M tokens
GPT-4o mini
OpenAI
MM
Best score:0.9 (HumanEval)
Released:Jul 2024
Price:$0.15/1M tokens
GPT-4.1
OpenAI
MM
Best score:0.9 (MMLU)
Released:Apr 2025
Price:$2.00/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.