IBM Granite 4.0 Tiny Preview

Name: IBM Granite 4.0 Tiny Preview
Author: IBM

IBM

Granite 4.0 Tiny Preview is a compact language model from IBM's Granite family, designed for resource-constrained deployments. Despite its small size, it delivers competitive performance in reasoning, coding, and instruction following, optimized for edge computing and low-latency applications.

Key Specifications

Parameters

7.0B

Context

Release Date

May 2, 2025

Average Score

57.1%

API Documentation Model Weights Results Blog

Timeline

Key dates in the model's history

Announcement

May 2, 2025

Last Update

July 19, 2025

Today

May 9, 2026

Technical Specifications

Parameters

7.0B

Training Tokens

2.5T tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

MMLU

Score Score • Self-reported

60.4%

TruthfulQA

Score Score • Self-reported

58.1%

Programming

Programming skills tests

HumanEval

Score AI-generated content is typically evaluated through benchmarks. However, models may be tuned to perform well on specific benchmarks, potentially leading to overestimation of their true capabilities. To address this, some researchers have advocated for the use of "adversarial" examples that are specifically designed to be challenging for AI systems. While these approaches are valuable for identifying model weaknesses, they still rely on pre-defined datasets and therefore may not provide a complete picture of model capabilities. An alternative approach is to evaluate AI systems using tests that are designed for humans, such as standardized exams. The appeal of this methodology is that these tests are designed to measure general knowledge and reasoning ability, and they were not created with AI systems in mind. This approach has been employed to evaluate large language models (LLMs) on a variety of human exams, including the SAT, LSAT, AP exams, and medical licensing exams. • Self-reported

82.4%

Mathematics

Mathematical problems and computations

GSM8k

## Score ****: (Claude 3 Opus, Claude 3 Sonnet, GPT-4o Claude 3 Haiku) 0 5 : * **Accuracy (0-5)**: 0 5 * **(0-5)**: 0 5 * **(0-5)**: 0 5 (). * **(0-5)**: 0 5 * **()**: 5 * **evaluation (0-5)**: evaluation Score : * * * • Self-reported

70.1%

Reasoning

Logical reasoning and analysis

BIG-Bench Hard

Score • Self-reported

55.7%

DROP

Score • Self-reported

46.2%

Other Tests

Specialized benchmarks

AlpacaEval 2.0

Score Score • Self-reported

35.2%

Arena Hard

Score • Self-reported

26.7%

AttaQ

Score evaluation (1 = 0 = ). () • Self-reported

86.1%

HumanEval+

Score AI: *() * : AI: *()* "Score", "Score" : : ** AI: ** : ** AI: ** AI • Self-reported

78.3%

IFEval

Score : "" (), "" (), "" () "" (). : 1. (solve): 1, 0. 2. (eval_correct): 1, 3. (eval_incorrect): 1, : solve, eval_correct eval_incorrect • Self-reported

63.0%

PopQA

Score AI: I'll analyze the model output for the Functional_Similarity task by comparing the model's reasoning with a reference solution. I'll break down both approaches step-by-step and award points based on matching key components. • Self-reported

22.9%

License & Metadata

License

apache_2_0

Announcement Date

May 2, 2025

Last Updated

July 19, 2025

Similar Models

All Models

Granite 3.3 8B Instruct

IBM

MM8.0B

Best score:0.9 (HumanEval)

Released:Apr 2025

Granite 3.3 8B Base

IBM

MM8.2B

Best score:0.9 (HumanEval)

Released:Apr 2025

Qwen2.5-Coder 7B Instruct

Alibaba

7.0B

Best score:0.9 (HumanEval)

Released:Sep 2024

Gemma 3 1B

Google

1.0B

Best score:0.4 (HumanEval)

Released:Mar 2025

DeepSeek R1 Distill Qwen 1.5B

DeepSeek

1.8B

Best score:0.3 (GPQA)

Released:Jan 2025

Llama 3.1 Nemotron Nano 8B V1

NVIDIA

8.0B

Best score:0.5 (GPQA)

Released:Mar 2025

DeepSeek R1 Distill Qwen 7B

DeepSeek

7.6B

Best score:0.5 (GPQA)

Released:Jan 2025

DeepSeek R1 Distill Llama 8B

DeepSeek

8.0B

Best score:0.5 (GPQA)

Released:Jan 2025

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.