Key Specifications
Parameters
-
Context
300.0K
Release Date
November 20, 2024
Average Score
70.7%
Timeline
Key dates in the model's history
Announcement
November 20, 2024
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$0.06
Output (per 1M tokens)
$0.24
Max Input Tokens
300.0K
Max Output Tokens
2.0K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
0-shot chain-of-thought AI: 0-shot • Self-reported
Programming
Programming skills tests
HumanEval
pass@1 n k pass@1 100% (accuracy), k/n, pass@1 : 1. 1 n, : ? 2. 1 k/n, pass@k • Self-reported
Mathematics
Mathematical problems and computations
GSM8k
0-shot CoT thinking. thinking • Self-reported
MATH
0-shot CoT Chain-of-thought (CoT) - LLM "" ": ...". CoT accuracy LLM, CoT (0-shot) few-shot CoT, 0-shot CoT • Self-reported
Reasoning
Logical reasoning and analysis
DROP
1. : "0-shot", "few-shot", 2. : 3. : 4. : 5. : 6. : 7. : • Self-reported
GPQA
6-shot CoT chain-of-thought () 6 (), thinking : 1. 2. 3. 6-shot • Self-reported
Multimodal
Working with images and visual data
ChartQA
relaxed accuracy • Self-reported
DocVQA
ANLS (ANLS) "", ANLS (NLD). NLD high ANLS (0.5) ANLS NLD (1-NLD, NLD < 0.5, 0), ANLS • Self-reported
MMMU
CoT accuracy accuracy (CoT). ": []". (). CoT • Self-reported
Other Tests
Specialized benchmarks
ARC-C
0-shot chain-of-thought AI: Chain-of-thought (CoT) — Standard CoT — (few-shot CoT). 0-shot CoT — CoT, "" "" • Self-reported
BBH
3-shot CoT "" (3-shot Chain of Thought, CoT) — LLM (Chain of Thought) (1-shot), 3-shot CoT • Self-reported
BFCL
accuracy • Self-reported
CRAG
accuracy • Self-reported
EgoSchema
accuracy • Self-reported
FinQA
0-shot accuracy • Self-reported
GroundUI-1K
accuracy • Self-reported
IFEval
0-shot CoT "" "". thinking • Self-reported
LVBench
accuracy • Self-reported
MM-Mind2Web
accuracy • Self-reported
SQuALITY
ROUGE-L — (LCS). ROUGE, n-ROUGE-L ROUGE-L F-LCS: - Accuracy: LCS, : LCS, F-: ROUGE-L • Self-reported
TextVQA
weighted accuracy • Self-reported
Translation en→Set1 COMET22
COMET22 COMET22. COMET22 — ATOMIC — — "PersonX ", — "xEffect", — "PersonX ", "PersonX " "PersonX " • Self-reported
Translation en→Set1 spBleu
spBleu spBleu — BLEU, BLEU spBleu : : 1. 2. 3. n-4. n-5. BLEU • Self-reported
Translation Set1→en COMET22
COMET22 Gopher22 COMET22 — Gopher22, STEM, "" : 1. 2. 3. 4. COMET22 • Self-reported
Translation Set1→en spBleu
spBleu spBleu, BLEU (Bilingual Evaluation Understudy), BLEU, spBleu BLEU spBleu BLEU, spBleu. spBleu • Self-reported
VATEX
CIDEr CIDEr (Consensus-based Image Description Evaluation). CIDEr TF-IDF (Term Frequency-Inverse Document Frequency) n-n-n-TF-IDF n-(1 4). CIDEr evaluation • Self-reported
VisualWebBench
accuracy • Self-reported
License & Metadata
License
proprietary
Announcement Date
November 20, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsNova Pro
Amazon
MM
Best score:0.9 (ARC)
Released:Nov 2024
Price:$0.80/1M tokens
Nova Micro
Amazon
Best score:0.9 (ARC)
Released:Nov 2024
Price:$0.03/1M tokens
GPT-4.1
OpenAI
MM
Best score:0.9 (MMLU)
Released:Apr 2025
Price:$2.00/1M tokens
o1-pro
OpenAI
MM
Best score:0.8 (GPQA)
Released:Dec 2024
GPT-4
OpenAI
MM
Best score:1.0 (ARC)
Released:Jun 2023
Price:$30.00/1M tokens
GPT-4o
OpenAI
MM
Best score:0.9 (HumanEval)
Released:May 2024
Price:$2.50/1M tokens
Gemini 2.0 Flash Thinking
MM
Best score:0.7 (GPQA)
Released:Jan 2025
Gemini 1.5 Pro
MM
Best score:0.9 (MMLU)
Released:May 2024
Price:$2.50/1M tokens
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.