Key Specifications
Parameters
-
Context
16.4K
Release Date
March 21, 2023
Average Score
42.3%
Timeline
Key dates in the model's history
Announcement
March 21, 2023
Last Update
July 19, 2025
Today
March 26, 2026
Technical Specifications
Parameters
-
Training Tokens
-
Knowledge Cutoff
September 30, 2021
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$0.50
Output (per 1M tokens)
$1.50
Max Input Tokens
16.4K
Max Output Tokens
4.1K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
Accuracy
AI • Verified
Programming
Programming skills tests
HumanEval
Accuracy AI: accuracy in GPQA constitutes 41,4%. by complexity: - questions: 50,5% - questions: 40,5% - questions: 33,3% by : - science: 36,1% - : 43,9% - : 41,7% - : 42,9% - : 42,9% by : - (44,4%), (50%), (33,3%), (66,7%) - (41,7%), (40%) - (33,3%), (37,5%) - (33,3%), (50%) - (40%), (50%) accuracy below human (42,2%), but above (25%) • Verified
Mathematics
Mathematical problems and computations
MATH
Accuracy AI: ChatGPT AI very quickly simple answers. need to be with in this still times: Question: [question from set tests GPQA] Answer: [answer from GPQA] When analysis answer, I its accuracy, considering how well he matches correct answer in Accuracy for this solutions is evaluated how [//]. I such evaluation, because that [explanation evaluation with on specific aspects answer]. [about that, correctly whether model question, is whether in her/its answer information or sufficiently whether she/it ] • Verified
MGSM
Accuracy
AI: Human • Verified
Reasoning
Logical reasoning and analysis
DROP
Accuracy
AI: 64.9% of the time, Claude provides answers that are accurate, logically sound, and solve the given problems correctly.
35.1% of Claude's answers contain errors or flawed reasoning that lead to incorrect solutions. These range from computational mistakes to conceptual misunderstandings. • Verified
GPQA
Accuracy • Verified
Multimodal
Working with images and visual data
MathVista
Accuracy AI: still but I how Stability AI and Anthropic (in ) make large steps Models level Gorilla have accuracy use API, than and Anthropic that Claude can more exactly perform instructions. I that accuracy answers • Verified
MMMU
Accuracy AI: In model should do steps, in order to obtain correct answer. Model generates steps, with mathematical points view? During time reasoning model can errors, such how errors or errors in reasoning. Human: In each step should be in order to to correct answer. Model should generate steps. During time reasoning model can errors, for example, errors in or in reasoning • Verified
License & Metadata
License
proprietary
Announcement Date
March 21, 2023
Last Updated
July 19, 2025
Similar Models
All Modelso3-mini
OpenAI
Best score:0.9 (MMLU)
Released:Jan 2025
Price:$1.10/1M tokens
GPT-5 Codex
OpenAI
Released:Sep 2025
Price:$2.00/1M tokens
o1-preview
OpenAI
Best score:0.9 (MMLU)
Released:Sep 2024
Price:$15.00/1M tokens
GPT-4 Turbo
OpenAI
Best score:0.9 (HumanEval)
Released:Apr 2024
Price:$10.00/1M tokens
o1-mini
OpenAI
Best score:0.9 (HumanEval)
Released:Sep 2024
Price:$3.00/1M tokens
o1
OpenAI
Best score:0.9 (MMLU)
Released:Dec 2024
Price:$15.00/1M tokens
GPT-4.1 mini
OpenAI
MM
Best score:0.9 (MMLU)
Released:Apr 2025
Price:$0.40/1M tokens
Claude 3.5 Haiku
Anthropic
Best score:0.9 (HumanEval)
Released:Oct 2024
Price:$0.80/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.