Granite 3.3 8B Base

Name: Granite 3.3 8B Base
Author: IBM

Multimodal

IBM

Granite 3.3 8B Base is a foundational language model from IBM's Granite family with 8 billion parameters. This is the pre-trained base model prior to instruction tuning, suitable for fine-tuning on domain-specific tasks. It demonstrates strong capabilities in language understanding, reasoning, and knowledge retrieval.

Key Specifications

Parameters

8.2B

Context

Release Date

April 16, 2025

Average Score

64.3%

API Documentation Repository Model Weights Results Blog

Timeline

Key dates in the model's history

Announcement

April 16, 2025

Last Update

July 19, 2025

Today

May 9, 2026

Technical Specifications

Parameters

8.2B

Training Tokens

Knowledge Cutoff

April 1, 2024

Family

Capabilities

MultimodalZeroEval

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

HellaSwag

Score : Measurement Approach Measurements need to assess the behavior of AI systems to provide confidence to a variety of stakeholders that the system meets societal expectations for its behavior. These behaviors and expectations include: - The system's adherence to established procedures to achieve intended goals - The system's potential for manipulation of users - The system's potential for harmful outputs - The system's competence in specific domains One method for measurement is self-evaluation, which evaluates a model on relevant benchmarks or metrics, or evaluates a model's adherence to responsible use strategies. Self-evaluation can include the use of synthetic evaluators, and both human and automatic evaluation. Frontier developers and governments should create measurement approaches that are strong enough to catch novel, unanticipated risks from frontier AI. However, it is difficult to measure risk—risks may be highly context-dependent or imprecisely defined. Quantitative measures should be complemented with regular qualitative assessments by independent third-party evaluators. Self-evaluation should also involve multiple methodologies and consider difficult-to-detect behaviors, such as deception. Frontier developers could publish results of self-evaluation across the development cycle to understand how risks emerge as systems improve, and to allow the broader community to give feedback on their risk assessment approaches • Self-reported

80.1%

MMLU

## Score 0 5: - 0: 1: 2: 3: 4: 5: : evaluation 5 5 • Self-reported

63.9%

TruthfulQA

Score • Self-reported

52.1%

Winogrande

Score AI: : 1. 2. () 3. 4. 5. evaluation • Self-reported

74.4%

Programming

Programming skills tests

HumanEval

# OLMES OLMES (Open-Language Model External Stimulus) - : 1. ****: 2. ****: 3. ****: OLMES : - OLMES : - OLMES • Self-reported

89.7%

Mathematics

Mathematical problems and computations

GSM8k

(0 1 ) (0 10) • Self-reported

59.0%

Reasoning

Logical reasoning and analysis

BIG-Bench Hard

OLMES () • Self-reported

69.1%

DROP

Score : 1. 2. 2. : - : : 3. : - : : 4. 0 5: 5: 4: Correct 3: Correct 2: 1: 0: 5 • Self-reported

36.1%

Other Tests

Specialized benchmarks

AGIEval

# Score (explanation-first), (answer-first), (validity), (factual correctness) • Self-reported

49.3%

AIME 2024

Not specified • Self-reported

81.2%

AlpacaEval 2.0

## Score "", "" "". evaluation0 3 • Self-reported

62.7%

ARC-C

Score Score • Self-reported

50.8%

Arena Hard

Arena Hard Claude RLHF Arena: Hard. Arena Hard RLHF (). Claude: RLHF Arena: Hard — : - : Claude : : : Anthropic Claude, Anthropic • Self-reported

57.6%

AttaQ

Not specified (OLMES) • Self-reported

88.5%

HumanEval+

# OLMES OLMES () — ## OLMES ## OLMES : 1. 2. 3. ## OLMES • Self-reported

86.1%

IFEval

OLMES AI: OLMES (Online Large Model Evaluation System) - nolmo.ai, LLM, OLMES : • API. • • OLMES : • • • • • Self-reported

74.8%

MATH-500

Not specified • Self-reported

69.0%

Score • Self-reported

36.5%

PopQA

Score • Self-reported

26.2%

TriviaQA

Score • Self-reported

78.2%

License & Metadata

License

apache_2_0

Announcement Date

April 16, 2025

Last Updated

July 19, 2025

Similar Models

All Models

Granite 3.3 8B Instruct

IBM

MM8.0B

Best score:0.9 (HumanEval)

Released:Apr 2025

IBM Granite 4.0 Tiny Preview

IBM

7.0B

Best score:0.8 (HumanEval)

Released:May 2025

Gemma 3n E4B

Google

MM8.0B

Best score:0.6 (ARC)

Released:Jun 2025

Phi-3.5-vision-instruct

Microsoft

MM4.2B

Released:Aug 2024

Phi-4-multimodal-instruct

Microsoft

MM5.6B

Released:Feb 2025

Price:$0.05/1M tokens

Gemini 1.5 Flash 8B

Google

MM8.0B

Best score:0.4 (GPQA)

Released:Mar 2024

Price:$0.07/1M tokens

Gemma 3n E2B

Google

MM8.0B

Best score:0.5 (ARC)

Released:Jun 2025

MedGemma 4B IT

Google

MM4.3B

Released:May 2025

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.