Key Specifications
Parameters
-
Context
-
Release Date
March 28, 2024
Average Score
63.9%
Timeline
Key dates in the model's history
Announcement
March 28, 2024
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
5-shot • Self-reported
Programming
Programming skills tests
HumanEval
Question: functions \(\frac{1}{2x \cdot \sqrt{x^2 + 1}}\). In order to functions \(\frac{1}{2x \cdot \sqrt{x^2 + 1}}\), I method First I \(x^2 + 1 = u\), \(x^2 = u - 1\) and \(2x \, dx = du\). \(dx = \frac{du}{2x}\). this in : \(\int \frac{1}{2x \cdot \sqrt{x^2 + 1}} \, dx = \int \frac{1}{2x \cdot \sqrt{u}} \cdot \frac{du}{2x} = \int \frac{1}{4x^2 \cdot \sqrt{u}} \, du\) Since \(x^2 = u - 1\), we receive: \(\int \frac{1}{4(u-1) \cdot \sqrt{u}} \, du\) This \(\int \frac{1}{(u-1) \cdot \sqrt{u}} \, du\), which can solve or using by I error. Let us other approach. \(\sqrt{x^2 + 1} = t\), \(x^2 + 1 = t^2\) and \(x^2 = t^2 - 1\). we receive \(2x \, dx = 2t \, dt\), \(dx = \frac{t \, dt}{x}\). from \(x^2 = t^2 - 1\) should, that \(x = \sqrt{t^2 - 1}\) (since \(x > 0\) in context tasks). in : \(\int \frac{1}{2x \cdot \sqrt{x^2 + 1}} \, dx = \int \frac{1}{2x \cdot t} \cdot \frac{t \, dt}{x} = \int \frac{1}{2x^2} \, dt\) Using \(x^2 = t^2 - 1\), we receive: \(\int \frac{1}{2(t^2 - 1)} \, dt = \frac{1}{2} \int \frac{1}{t^2 - 1} \, dt\) \(\int \frac{1}{t^2 - 1} \, dt\) can with help method : \(\frac{ • Self-reported
Mathematics
Mathematical problems and computations
GSM8k
8-shot • Self-reported
MATH
4-shot • Self-reported
Reasoning
Logical reasoning and analysis
GPQA
0-shot This method means, that model performs task without any-or examples or instructions by tasks. This most and direct test abilities model, so how information, which receives model — this assignment, which necessary execute. Method 0-shot especially useful for evaluation capabilities model, but can be for tasks, which require specific format answer or specific instructions, which not were explicitly • Self-reported
Multimodal
Working with images and visual data
DocVQA
shot In this mode we ability model correctly answer on questions without examples. We question with simple give answer. This allows us better understand capabilities model answer on questions from field. Examples: - with 5, 5 and 6. - f(x) = x^3 + 2x^2 - 5x + 7. - Solve equation 3x + 5 = 2x - 7 • Self-reported
MathVista
0-shot In model testing 0-shot model should solve task with only on its training, without any-or specific examples, solution similar problems. testing allows us evaluate ability model solve tasks without special instructions. This, how most complex for model way testing, since it not is provided no/none additional prompts or context for solutions problems. In context our 0-shot testing basic capabilities model in field mathematical reasoning and her/its ability preliminarily obtained knowledge on new tasks without additional training • Self-reported
MMMU
analysis models artificial intelligence, Anthropic, with on performance model Claude 3 Opus at solving tasks. I Claude 3 Opus with Claude 2 and Claude 3 Sonnet, in order to evaluate their performance at solving mathematical tasks and understand, how well was improvement from Claude 2 to Claude 3 Opus. Using tasks and instruction ("Solve step-by-step"), I queries all models. Results showed, that Claude 3 Opus significantly outperforms how Claude 2, so and Claude 3 Sonnet by accuracy solutions complex tasks. : - Claude 2 errors in test tasks and solutions - Claude 3 Sonnet some improvements by comparison with Claude 2, but all still errors in complex tasks - Claude 3 Opus correctly practically all tasks with and These results Anthropic about mathematical abilities in new models Claude 3, especially in model Opus • Self-reported
Other Tests
Specialized benchmarks
MMLU-Pro
0-shot Approach «with training» means, that model not no/none examples execution specific tasks before her/its solution. Instead this it give only instructions with tasks, which need to execute. For example, model could would obtain instruction «is whether statement or » without any-or examples or Approach with training especially important at evaluation abilities language models, since he measures their understanding and generalization, and not simply ability follow from examples. This also makes evaluation more so how in real scenarios use model not receive examples in advance. When language model, such how GPT-4, are evaluated in mode with training, this shows, how well well they can apply its general knowledge and understanding to new tasks without additional training or settings • Self-reported
License & Metadata
License
proprietary
Announcement Date
March 28, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsGrok Code Fast 1
xAI
Released:Aug 2025
Price:$0.20/1M tokens
Mercury 2
Inception
Best score:0.7 (GPQA)
Released:Feb 2026
Gemini Diffusion
Best score:0.9 (HumanEval)
Released:May 2025
Qwen3 Max
Alibaba
Best score:0.6 (GPQA)
Released:Dec 2025
Grok-3 Mini
xAI
MM
Best score:0.8 (GPQA)
Released:Feb 2025
Price:$0.30/1M tokens
Grok-3
xAI
MM
Best score:0.8 (GPQA)
Released:Feb 2025
Price:$3.00/1M tokens
Grok-4 Heavy
xAI
MM
Best score:0.9 (GPQA)
Released:Jul 2025
Grok 4 Fast
xAI
MM
Best score:0.9 (GPQA)
Released:Aug 2025
Price:$0.20/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.