IBM logo

IBM Granite 4.0 Tiny Preview

IBM

Granite 4.0 Tiny Preview is a compact language model from IBM's Granite family, designed for resource-constrained deployments. Despite its small size, it delivers competitive performance in reasoning, coding, and instruction following, optimized for edge computing and low-latency applications.

Key Specifications

Parameters
7.0B
Context
-
Release Date
May 2, 2025
Average Score
57.1%

Timeline

Key dates in the model's history
Announcement
May 2, 2025
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
7.0B
Training Tokens
2.5T tokens
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
MMLU
Score ScoreSelf-reported
60.4%
TruthfulQA
Score ScoreSelf-reported
58.1%

Programming

Programming skills tests
HumanEval
Score AI-generated content is typically evaluated through benchmarks. However, models may be tuned to perform well on specific benchmarks, potentially leading to overestimation of their true capabilities. To address this, some researchers have advocated for the use of "adversarial" examples that are specifically designed to be challenging for AI systems. While these approaches are valuable for identifying model weaknesses, they still rely on pre-defined datasets and therefore may not provide a complete picture of model capabilities. An alternative approach is to evaluate AI systems using tests that are designed for humans, such as standardized exams. The appeal of this methodology is that these tests are designed to measure general knowledge and reasoning ability, and they were not created with AI systems in mind. This approach has been employed to evaluate large language models (LLMs) on a variety of human exams, including the SAT, LSAT, AP exams, and medical licensing exams.Self-reported
82.4%

Mathematics

Mathematical problems and computations
GSM8k
## Score ****: (Claude 3 Opus, Claude 3 Sonnet, GPT-4o Claude 3 Haiku) 0 5 : * **Accuracy (0-5)**: 0 5 * **(0-5)**: 0 5 * **(0-5)**: 0 5 (). * **(0-5)**: 0 5 * **()**: 5 * **evaluation (0-5)**: evaluation Score : * * *Self-reported
70.1%

Reasoning

Logical reasoning and analysis
BIG-Bench Hard
ScoreSelf-reported
55.7%
DROP
ScoreSelf-reported
46.2%

Other Tests

Specialized benchmarks
AlpacaEval 2.0
Score ScoreSelf-reported
35.2%
Arena Hard
ScoreSelf-reported
26.7%
AttaQ
Score evaluation (1 = 0 = ). ()Self-reported
86.1%
HumanEval+
Score AI: *() * : AI: *()* "Score", "Score" : : ** AI: ** : ** AI: ** AISelf-reported
78.3%
IFEval
Score : "" (), "" (), "" () "" (). : 1. (solve): 1, 0. 2. (eval_correct): 1, 3. (eval_incorrect): 1, : solve, eval_correct eval_incorrectSelf-reported
63.0%
PopQA
Score AI: I'll analyze the model output for the Functional_Similarity task by comparing the model's reasoning with a reference solution. I'll break down both approaches step-by-step and award points based on matching key components.Self-reported
22.9%

License & Metadata

License
apache_2_0
Announcement Date
May 2, 2025
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.