Amazon logo

Nova Pro

Multimodal
Amazon

Amazon Nova Pro is a high-performance multimodal model that balances accuracy, speed, and cost for a wide range of tasks. It processes text, images, and video inputs, and supports agentic workflows. Ideal for complex enterprise tasks requiring in-depth analysis, multi-step reasoning, and content generation.

Key Specifications

Parameters
-
Context
300.0K
Release Date
November 20, 2024
Average Score
73.2%

Timeline

Key dates in the model's history
Announcement
November 20, 2024
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$0.80
Output (per 1M tokens)
$3.20
Max Input Tokens
300.0K
Max Output Tokens
300.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
MMLU
0-shot Chain-of-Thought AI: 0-shot Chain-of-ThoughtSelf-reported
85.9%

Programming

Programming skills tests
HumanEval
0-shot pass@1 AI: **Self-reported
89.0%

Mathematics

Mathematical problems and computations
GSM8k
0-shot Chain-of-Thought AI: 0-shot Chain-of-ThoughtSelf-reported
94.8%
MATH
0-shot Chain-of-Thought AI: 0-shot Chain-of-ThoughtSelf-reported
76.6%

Reasoning

Logical reasoning and analysis
DROP
## 0-shot 0-shot AI, few-shot () fine-tuning (). 0-shot evaluation : 1. 2. 3. 0-shot : - "", : - 0-shot (GPQA, MATH), (HumanEval)Self-reported
85.4%
GPQA
6-shot Chain-of-Thought AI: 6 (chain-of-thought)Self-reported
46.9%

Multimodal

Working with images and visual data
ChartQA
relaxed accuracySelf-reported
89.2%
DocVQA
ANLS (ANLS) - ANLS (LCS) ANLS : ANLS = (LCS()) / ((), ()) LCS - ANLS 0 1, 1 0 ANLSSelf-reported
93.5%
MMMU
AI: : Chain-of-thought (CoT) prompting is a technique that helps large language models (LLMs) tackle challenging problems by breaking down their reasoning into manageable steps. First introduced in the paper "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (Wei et al., 2022), CoT prompting has become one of the most important techniques for improving the reasoning capabilities of LLMs. CoT prompting can be implemented in various ways: - Few-shot CoT: The prompt includes examples that demonstrate step-by-step reasoning for similar problems - Zero-shot CoT: The model is simply instructed to "think step by step" with no examples provided - Self-consistency with CoT: The model generates multiple reasoning paths and selects the most consistent answerSelf-reported
61.7%

Other Tests

Specialized benchmarks
ARC-C
0-shot Chain-of-Thought AI: 0-shot Chain-of-ThoughtSelf-reported
94.8%
BBH
3-shot Chain-of-ThoughtSelf-reported
86.9%
BFCL
accuracySelf-reported
68.4%
CRAG
accuracySelf-reported
50.3%
EgoSchema
accuracySelf-reported
72.1%
FinQA
0-shot accuracySelf-reported
77.2%
GroundUI-1K
accuracySelf-reported
81.4%
IFEval
# 0-shotSelf-reported
92.1%
LVBench
accuracySelf-reported
41.6%
MM-Mind2Web
Accuracy AI: 1 step accuracy, in %, [n]: The percentage of n-step reasoning traces that are correct at each step. For example, the proportion of n-step reasoning traces that get step 1 right, the proportion that get step 2 right given that step 1 is right, etc. This metric is useful for identifying where models make errors in multi-step reasoning, and how error rates change along the course of a reasoning traceSelf-reported
63.7%
SQuALITY
ROUGE-L ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation ) (LCS). LCS — — "", — "", LCS "" (3 ). ROUGE-L : (Recall) = _LCS / __Accuracy (Precision) = _LCS / __F-= (1 + β²) × (× Accuracy) / (β² × + Accuracy) β 1. ROUGE-LSelf-reported
19.8%
TextVQA
weighted accuracySelf-reported
81.5%
Translation en→Set1 COMET22
COMET22 ScoreSelf-reported
89.1%
Translation en→Set1 spBleu
spBleu BLEU. BLEU, spBleu (AST) (AST), AST "int foo(int a)" "INT_TYPE IDENTIFIER(INT_TYPE IDENTIFIER)". spBleu N-BLEU-spBleuSelf-reported
43.4%
Translation Set1→en COMET22
COMET22 COMET22, COMET22 COMET22Self-reported
89.0%
Translation Set1→en spBleu
spBleu BLEU BLEU spBleu "A B" "C D". — —Self-reported
44.4%
VATEX
CIDEr CIDEr (Consensus-based Image Description Evaluation) – : - n-accuracy, TF-IDF CIDEr n-(n-)Self-reported
77.8%
VisualWebBench
Standard evaluationSelf-reported
79.7%

License & Metadata

License
proprietary
Announcement Date
November 20, 2024
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.