Granite 3.3 8B Base
MultimodalGranite 3.3 8B Base is a foundational language model from IBM's Granite family with 8 billion parameters. This is the pre-trained base model prior to instruction tuning, suitable for fine-tuning on domain-specific tasks. It demonstrates strong capabilities in language understanding, reasoning, and knowledge retrieval.
Key Specifications
Parameters
8.2B
Context
-
Release Date
April 16, 2025
Average Score
64.3%
Timeline
Key dates in the model's history
Announcement
April 16, 2025
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
8.2B
Training Tokens
-
Knowledge Cutoff
April 1, 2024
Family
-
Capabilities
MultimodalZeroEval
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
HellaSwag
Score : Measurement Approach Measurements need to assess the behavior of AI systems to provide confidence to a variety of stakeholders that the system meets societal expectations for its behavior. These behaviors and expectations include: - The system's adherence to established procedures to achieve intended goals - The system's potential for manipulation of users - The system's potential for harmful outputs - The system's competence in specific domains One method for measurement is self-evaluation, which evaluates a model on relevant benchmarks or metrics, or evaluates a model's adherence to responsible use strategies. Self-evaluation can include the use of synthetic evaluators, and both human and automatic evaluation. Frontier developers and governments should create measurement approaches that are strong enough to catch novel, unanticipated risks from frontier AI. However, it is difficult to measure risk—risks may be highly context-dependent or imprecisely defined. Quantitative measures should be complemented with regular qualitative assessments by independent third-party evaluators. Self-evaluation should also involve multiple methodologies and consider difficult-to-detect behaviors, such as deception. Frontier developers could publish results of self-evaluation across the development cycle to understand how risks emerge as systems improve, and to allow the broader community to give feedback on their risk assessment approaches • Self-reported
MMLU
## Score 0 5: - 0: 1: 2: 3: 4: 5: : evaluation 5 5 • Self-reported
TruthfulQA
Score • Self-reported
Winogrande
Score AI: : 1. 2. () 3. 4. 5. evaluation • Self-reported
Programming
Programming skills tests
HumanEval
# OLMES OLMES (Open-Language Model External Stimulus) - : 1. ****: 2. ****: 3. ****: OLMES : - OLMES : - OLMES • Self-reported
Mathematics
Mathematical problems and computations
GSM8k
(0 1 ) (0 10) • Self-reported
Reasoning
Logical reasoning and analysis
BIG-Bench Hard
OLMES () • Self-reported
DROP
Score : 1. 2. 2. : - : : 3. : - : : 4. 0 5: 5: 4: Correct 3: Correct 2: 1: 0: 5 • Self-reported
Other Tests
Specialized benchmarks
AGIEval
# Score (explanation-first), (answer-first), (validity), (factual correctness) • Self-reported
AIME 2024
Not specified • Self-reported
AlpacaEval 2.0
## Score "", "" "". evaluation0 3 • Self-reported
ARC-C
Score
Score • Self-reported
Arena Hard
Arena Hard Claude RLHF Arena: Hard. Arena Hard RLHF (). Claude: RLHF Arena: Hard — : - : Claude : : : Anthropic Claude, Anthropic • Self-reported
AttaQ
Not specified (OLMES) • Self-reported
HumanEval+
# OLMES OLMES () — ## OLMES ## OLMES : 1. 2. 3. ## OLMES • Self-reported
IFEval
OLMES AI: OLMES (Online Large Model Evaluation System) - nolmo.ai, LLM, OLMES : • API. • • OLMES : • • • • • Self-reported
MATH-500
Not specified • Self-reported
NQ
Score • Self-reported
PopQA
Score • Self-reported
TriviaQA
Score • Self-reported
License & Metadata
License
apache_2_0
Announcement Date
April 16, 2025
Last Updated
July 19, 2025
Similar Models
All ModelsGranite 3.3 8B Instruct
IBM
MM8.0B
Best score:0.9 (HumanEval)
Released:Apr 2025
IBM Granite 4.0 Tiny Preview
IBM
7.0B
Best score:0.8 (HumanEval)
Released:May 2025
Gemma 3n E4B
MM8.0B
Best score:0.6 (ARC)
Released:Jun 2025
Phi-3.5-vision-instruct
Microsoft
MM4.2B
Released:Aug 2024
Phi-4-multimodal-instruct
Microsoft
MM5.6B
Released:Feb 2025
Price:$0.05/1M tokens
Gemini 1.5 Flash 8B
MM8.0B
Best score:0.4 (GPQA)
Released:Mar 2024
Price:$0.07/1M tokens
Gemma 3n E2B
MM8.0B
Best score:0.5 (ARC)
Released:Jun 2025
MedGemma 4B IT
MM4.3B
Released:May 2025
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.