Nova Pro

Name: Nova Pro
Author: Amazon

Multimodal

Amazon

Amazon Nova Pro is a high-performance multimodal model that balances accuracy, speed, and cost for a wide range of tasks. It processes text, images, and video inputs, and supports agentic workflows. Ideal for complex enterprise tasks requiring in-depth analysis, multi-step reasoning, and content generation.

Key Specifications

Parameters

Context

300.0K

Release Date

November 20, 2024

Average Score

73.2%

API Documentation Research Paper Repository

Timeline

Key dates in the model's history

Announcement

November 20, 2024

Last Update

July 19, 2025

Today

May 10, 2026

Technical Specifications

Parameters

Training Tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$0.80

Output (per 1M tokens)

$3.20

Max Input Tokens

300.0K

Max Output Tokens

300.0K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

MMLU

0-shot Chain-of-Thought AI: 0-shot Chain-of-Thought • Self-reported

85.9%

Programming

Programming skills tests

HumanEval

0-shot pass@1 AI: ** • Self-reported

89.0%

Mathematics

Mathematical problems and computations

GSM8k

0-shot Chain-of-Thought AI: 0-shot Chain-of-Thought • Self-reported

94.8%

MATH

0-shot Chain-of-Thought AI: 0-shot Chain-of-Thought • Self-reported

76.6%

Reasoning

Logical reasoning and analysis

DROP

## 0-shot 0-shot AI, few-shot () fine-tuning (). 0-shot evaluation : 1. 2. 3. 0-shot : - "", : - 0-shot (GPQA, MATH), (HumanEval) • Self-reported

85.4%

GPQA

6-shot Chain-of-Thought AI: 6 (chain-of-thought) • Self-reported

46.9%

Multimodal

Working with images and visual data

ChartQA

relaxed accuracy • Self-reported

89.2%

DocVQA

ANLS (ANLS) - ANLS (LCS) ANLS : ANLS = (LCS()) / ((), ()) LCS - ANLS 0 1, 1 0 ANLS • Self-reported

93.5%

MMMU

AI: : Chain-of-thought (CoT) prompting is a technique that helps large language models (LLMs) tackle challenging problems by breaking down their reasoning into manageable steps. First introduced in the paper "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (Wei et al., 2022), CoT prompting has become one of the most important techniques for improving the reasoning capabilities of LLMs. CoT prompting can be implemented in various ways: - Few-shot CoT: The prompt includes examples that demonstrate step-by-step reasoning for similar problems - Zero-shot CoT: The model is simply instructed to "think step by step" with no examples provided - Self-consistency with CoT: The model generates multiple reasoning paths and selects the most consistent answer • Self-reported

61.7%

Other Tests

Specialized benchmarks

ARC-C

0-shot Chain-of-Thought AI: 0-shot Chain-of-Thought • Self-reported

94.8%

BBH

3-shot Chain-of-Thought • Self-reported

86.9%

BFCL

accuracy • Self-reported

68.4%

CRAG

accuracy • Self-reported

50.3%

EgoSchema

accuracy • Self-reported

72.1%

FinQA

0-shot accuracy • Self-reported

77.2%

GroundUI-1K

accuracy • Self-reported

81.4%

IFEval

# 0-shot • Self-reported

92.1%

LVBench

accuracy • Self-reported

41.6%

MM-Mind2Web

Accuracy AI: 1 step accuracy, in %, [n]: The percentage of n-step reasoning traces that are correct at each step. For example, the proportion of n-step reasoning traces that get step 1 right, the proportion that get step 2 right given that step 1 is right, etc. This metric is useful for identifying where models make errors in multi-step reasoning, and how error rates change along the course of a reasoning trace • Self-reported

63.7%

SQuALITY

ROUGE-L ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation ) (LCS). LCS — — "", — "", LCS "" (3 ). ROUGE-L : (Recall) = _LCS / __Accuracy (Precision) = _LCS / __F-= (1 + β²) × (× Accuracy) / (β² × + Accuracy) β 1. ROUGE-L • Self-reported

19.8%

TextVQA

weighted accuracy • Self-reported

81.5%

Translation en→Set1 COMET22

COMET22 Score • Self-reported

89.1%

Translation en→Set1 spBleu

spBleu BLEU. BLEU, spBleu (AST) (AST), AST "int foo(int a)" "INT_TYPE IDENTIFIER(INT_TYPE IDENTIFIER)". spBleu N-BLEU-spBleu • Self-reported

43.4%

Translation Set1→en COMET22

COMET22 COMET22, COMET22 COMET22 • Self-reported

89.0%

Translation Set1→en spBleu

spBleu BLEU BLEU spBleu "A B" "C D". — — • Self-reported

44.4%

VATEX

CIDEr CIDEr (Consensus-based Image Description Evaluation) – : - n-accuracy, TF-IDF CIDEr n-(n-) • Self-reported

77.8%

VisualWebBench

Standard evaluation • Self-reported

79.7%

License & Metadata

License

proprietary

Announcement Date

November 20, 2024

Last Updated

July 19, 2025

Similar Models

All Models

Nova Lite

Amazon

Best score:0.9 (ARC)

Released:Nov 2024

Price:$0.06/1M tokens

Nova Micro

Amazon

Best score:0.9 (ARC)

Released:Nov 2024

Price:$0.03/1M tokens

ERNIE 5.0

Baidu

Best score:0.8 (GPQA)

Released:Jan 2025

Kimi-k1.5

Moonshot AI

Best score:0.9 (MMLU)

Released:Jan 2025

Gemini 2.0 Flash Thinking

Google

Best score:0.7 (GPQA)

Released:Jan 2025

GPT-4

OpenAI

Best score:1.0 (ARC)

Released:Jun 2023

Price:$30.00/1M tokens

GPT-4o

OpenAI

Best score:0.9 (HumanEval)

Released:May 2024

Price:$2.50/1M tokens

Gemini 1.5 Pro

Google

Best score:0.9 (MMLU)

Released:May 2024

Price:$2.50/1M tokens

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.