IBM Granite 4.0 Tiny Preview
Granite 4.0 Tiny Preview is a compact language model from IBM's Granite family, designed for resource-constrained deployments. Despite its small size, it delivers competitive performance in reasoning, coding, and instruction following, optimized for edge computing and low-latency applications.
Key Specifications
Parameters
7.0B
Context
-
Release Date
May 2, 2025
Average Score
57.1%
Timeline
Key dates in the model's history
Announcement
May 2, 2025
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
7.0B
Training Tokens
2.5T tokens
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
Score
Score • Self-reported
TruthfulQA
Score
Score • Self-reported
Programming
Programming skills tests
HumanEval
Score
AI-generated content is typically evaluated through benchmarks. However, models may be tuned to perform well on specific benchmarks, potentially leading to overestimation of their true capabilities.
To address this, some researchers have advocated for the use of "adversarial" examples that are specifically designed to be challenging for AI systems. While these approaches are valuable for identifying model weaknesses, they still rely on pre-defined datasets and therefore may not provide a complete picture of model capabilities.
An alternative approach is to evaluate AI systems using tests that are designed for humans, such as standardized exams. The appeal of this methodology is that these tests are designed to measure general knowledge and reasoning ability, and they were not created with AI systems in mind. This approach has been employed to evaluate large language models (LLMs) on a variety of human exams, including the SAT, LSAT, AP exams, and medical licensing exams. • Self-reported
Mathematics
Mathematical problems and computations
GSM8k
## Score ****: (Claude 3 Opus, Claude 3 Sonnet, GPT-4o Claude 3 Haiku) 0 5 : * **Accuracy (0-5)**: 0 5 * **(0-5)**: 0 5 * **(0-5)**: 0 5 (). * **(0-5)**: 0 5 * **()**: 5 * **evaluation (0-5)**: evaluation Score : * * * • Self-reported
Reasoning
Logical reasoning and analysis
BIG-Bench Hard
Score • Self-reported
DROP
Score • Self-reported
Other Tests
Specialized benchmarks
AlpacaEval 2.0
Score
Score • Self-reported
Arena Hard
Score • Self-reported
AttaQ
Score evaluation (1 = 0 = ). () • Self-reported
HumanEval+
Score AI: *() * : AI: *()* "Score", "Score" : : ** AI: ** : ** AI: ** AI • Self-reported
IFEval
Score : "" (), "" (), "" () "" (). : 1. (solve): 1, 0. 2. (eval_correct): 1, 3. (eval_incorrect): 1, : solve, eval_correct eval_incorrect • Self-reported
PopQA
Score
AI: I'll analyze the model output for the Functional_Similarity task by comparing the model's reasoning with a reference solution. I'll break down both approaches step-by-step and award points based on matching key components. • Self-reported
License & Metadata
License
apache_2_0
Announcement Date
May 2, 2025
Last Updated
July 19, 2025
Similar Models
All ModelsGranite 3.3 8B Instruct
IBM
MM8.0B
Best score:0.9 (HumanEval)
Released:Apr 2025
Granite 3.3 8B Base
IBM
MM8.2B
Best score:0.9 (HumanEval)
Released:Apr 2025
Qwen2.5-Coder 7B Instruct
Alibaba
7.0B
Best score:0.9 (HumanEval)
Released:Sep 2024
Gemma 3 1B
1.0B
Best score:0.4 (HumanEval)
Released:Mar 2025
DeepSeek R1 Distill Qwen 1.5B
DeepSeek
1.8B
Best score:0.3 (GPQA)
Released:Jan 2025
Llama 3.1 Nemotron Nano 8B V1
NVIDIA
8.0B
Best score:0.5 (GPQA)
Released:Mar 2025
DeepSeek R1 Distill Qwen 7B
DeepSeek
7.6B
Best score:0.5 (GPQA)
Released:Jan 2025
DeepSeek R1 Distill Llama 8B
DeepSeek
8.0B
Best score:0.5 (GPQA)
Released:Jan 2025
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.