xAI logo

Grok-1.5

xAI

An advanced language model with improved reasoning capabilities, particularly excelling in coding and math tasks. Features a 128K token context window and enhanced problem-solving abilities compared to its predecessor.

Key Specifications

Parameters
-
Context
-
Release Date
March 28, 2024
Average Score
63.9%

Timeline

Key dates in the model's history
Announcement
March 28, 2024
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
MMLU
5-shotSelf-reported
81.3%

Programming

Programming skills tests
HumanEval
Question: functions \(\frac{1}{2x \cdot \sqrt{x^2 + 1}}\). In order to functions \(\frac{1}{2x \cdot \sqrt{x^2 + 1}}\), I method First I \(x^2 + 1 = u\), \(x^2 = u - 1\) and \(2x \, dx = du\). \(dx = \frac{du}{2x}\). this in : \(\int \frac{1}{2x \cdot \sqrt{x^2 + 1}} \, dx = \int \frac{1}{2x \cdot \sqrt{u}} \cdot \frac{du}{2x} = \int \frac{1}{4x^2 \cdot \sqrt{u}} \, du\) Since \(x^2 = u - 1\), we receive: \(\int \frac{1}{4(u-1) \cdot \sqrt{u}} \, du\) This \(\int \frac{1}{(u-1) \cdot \sqrt{u}} \, du\), which can solve or using by I error. Let us other approach. \(\sqrt{x^2 + 1} = t\), \(x^2 + 1 = t^2\) and \(x^2 = t^2 - 1\). we receive \(2x \, dx = 2t \, dt\), \(dx = \frac{t \, dt}{x}\). from \(x^2 = t^2 - 1\) should, that \(x = \sqrt{t^2 - 1}\) (since \(x > 0\) in context tasks). in : \(\int \frac{1}{2x \cdot \sqrt{x^2 + 1}} \, dx = \int \frac{1}{2x \cdot t} \cdot \frac{t \, dt}{x} = \int \frac{1}{2x^2} \, dt\) Using \(x^2 = t^2 - 1\), we receive: \(\int \frac{1}{2(t^2 - 1)} \, dt = \frac{1}{2} \int \frac{1}{t^2 - 1} \, dt\) \(\int \frac{1}{t^2 - 1} \, dt\) can with help method : \(\frac{Self-reported
74.1%

Mathematics

Mathematical problems and computations
GSM8k
8-shotSelf-reported
90.0%
MATH
4-shotSelf-reported
50.6%

Reasoning

Logical reasoning and analysis
GPQA
0-shot This method means, that model performs task without any-or examples or instructions by tasks. This most and direct test abilities model, so how information, which receives model — this assignment, which necessary execute. Method 0-shot especially useful for evaluation capabilities model, but can be for tasks, which require specific format answer or specific instructions, which not were explicitlySelf-reported
35.9%

Multimodal

Working with images and visual data
DocVQA
shot In this mode we ability model correctly answer on questions without examples. We question with simple give answer. This allows us better understand capabilities model answer on questions from field. Examples: - with 5, 5 and 6. - f(x) = x^3 + 2x^2 - 5x + 7. - Solve equation 3x + 5 = 2x - 7Self-reported
85.6%
MathVista
0-shot In model testing 0-shot model should solve task with only on its training, without any-or specific examples, solution similar problems. testing allows us evaluate ability model solve tasks without special instructions. This, how most complex for model way testing, since it not is provided no/none additional prompts or context for solutions problems. In context our 0-shot testing basic capabilities model in field mathematical reasoning and her/its ability preliminarily obtained knowledge on new tasks without additional trainingSelf-reported
52.8%
MMMU
analysis models artificial intelligence, Anthropic, with on performance model Claude 3 Opus at solving tasks. I Claude 3 Opus with Claude 2 and Claude 3 Sonnet, in order to evaluate their performance at solving mathematical tasks and understand, how well was improvement from Claude 2 to Claude 3 Opus. Using tasks and instruction ("Solve step-by-step"), I queries all models. Results showed, that Claude 3 Opus significantly outperforms how Claude 2, so and Claude 3 Sonnet by accuracy solutions complex tasks. : - Claude 2 errors in test tasks and solutions - Claude 3 Sonnet some improvements by comparison with Claude 2, but all still errors in complex tasks - Claude 3 Opus correctly practically all tasks with and These results Anthropic about mathematical abilities in new models Claude 3, especially in model OpusSelf-reported
53.6%

Other Tests

Specialized benchmarks
MMLU-Pro
0-shot Approach «with training» means, that model not no/none examples execution specific tasks before her/its solution. Instead this it give only instructions with tasks, which need to execute. For example, model could would obtain instruction «is whether statement or » without any-or examples or Approach with training especially important at evaluation abilities language models, since he measures their understanding and generalization, and not simply ability follow from examples. This also makes evaluation more so how in real scenarios use model not receive examples in advance. When language model, such how GPT-4, are evaluated in mode with training, this shows, how well well they can apply its general knowledge and understanding to new tasks without additional training or settingsSelf-reported
51.0%

License & Metadata

License
proprietary
Announcement Date
March 28, 2024
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.