xAI logo

Grok-1.5V

Multimodal
xAI

A multimodal model capable of processing text and visual information, including documents, diagrams, charts, screenshots, and photos. Features strong real-world spatial understanding capabilities.

Key Specifications

Parameters
-
Context
-
Release Date
April 12, 2024
Average Score
71.9%

Timeline

Key dates in the model's history
Announcement
April 12, 2024
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

Multimodal

Working with images and visual data
AI2D
Evaluation without preliminary trainingSelf-reported
88.3%
ChartQA
Evaluation withoutSelf-reported
76.1%
DocVQA
Evaluation in mode training AI: translation! This indeed correct for "zero-shot evaluation" in bySelf-reported
85.6%
MathVista
Evaluation without preliminary trainingSelf-reported
52.8%
MMMU
Evaluation without preliminary trainingSelf-reported
53.6%

Other Tests

Specialized benchmarks
RealWorldQA
Evaluation in mode trainingSelf-reported
68.7%
TextVQA
evaluation without preliminary trainingSelf-reported
78.1%

License & Metadata

License
proprietary
Announcement Date
April 12, 2024
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.