o1-preview

Name: o1-preview
Author: OpenAI

OpenAI

A research preview model focused on mathematical and logical reasoning abilities, demonstrating improved performance on tasks requiring step-by-step reasoning, mathematical problem-solving, and code generation. The model shows extended formal reasoning capabilities while maintaining strong general abilities.

Key Specifications

Parameters

Context

128.0K

Release Date

September 12, 2024

Average Score

64.8%

API Documentation Research Paper Repository Results Blog

Timeline

Key dates in the model's history

Announcement

September 12, 2024

Last Update

July 19, 2025

Today

May 10, 2026

Technical Specifications

Parameters

Training Tokens

Knowledge Cutoff

Family

Capabilities

MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)

$15.00

Output (per 1M tokens)

$60.00

Max Input Tokens

128.0K

Max Output Tokens

32.8K

Supported Features

Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding

MMLU

When execution tasks pass@1 only one answer on each from tasks. This standard with which we in use models AI. When model answer on question we usually we receive answer. If at model no capabilities obtain more information or manner quality first answer has value. In our analysis pass@1 measures proportion tasks, for which first and answer model correct • Self-reported

90.8%

Programming

Programming skills tests

SWE-Bench Verified

# Process In order to verify, indeed whether GPT-4 Turbo can ensure accuracy answers, we used method from work by Verified-responses. We developed process: 1. **answer**: We task and we ask GPT-4 Turbo give answer with for him level 2. **Query **: We we ask model provide justification its answer, her/its thoroughly verify its work. 3. ****: We we ask model still times verify justification and whether she/it in answer. 4. **Query level **: We we ask model evaluate its confidence by scale from 1 to 5. 5. **capability "not "**: We model "not ", if she/it not fully in its answer. We we consider answer "", only if model level (5/5), without that in its answer, and provides justification. If on process model answer not is considered This process encourages model to and gives it set capabilities that probability errors • Self-reported

41.3%

Mathematics

Mathematical problems and computations

MATH

pass@1 This score, probability that, that model correct answer with first attempts. He measures proportion tasks, which model solves correctly at without capabilities or For computation this metrics one solution model for each tasks and is determined proportion tasks, in which model correct answer. This strict measure, since not no/none attempts improvements or corrections answer. This score especially important in contexts, where is required exact answer, and no capabilities for repeated attempts or thinking • Self-reported

85.5%

MGSM

pass@1 This method evaluates, probability that, that model correct answer with first attempts. He based on concepts, that model can give different answers at various (then is at different ). When use pass@1, we first several answers model on one and indeed task with different and then we evaluate probability correct answer with first attempts. that we we use model for solutions tasks coding. We could would generate several solutions for each tasks (for example, 100 solutions at various ), and then evaluate, which percentage from them correct. This gives us probability that, that model correct answer with first attempts. Such approach ensures more evaluation capabilities model, than simply verification one answer, since accounts for models • Self-reported

90.8%

Reasoning

Logical reasoning and analysis

GPQA

Pass@1 - this metric, used for evaluation efficiency model at solving tasks for one pass. In difference from method and errors, which allows model do several attempts, pass@1 measures ability model give correct solution with first attempts. This metric especially important for evaluation abilities models solve complex mathematical and logical tasks, where first solution should be High score pass@1 indicates on then, that model understanding problems and can effectively apply corresponding knowledge and strategies solutions without necessity in approach. When evaluation pass@1 model receives only one attempt for solutions each tasks, and result is determined how proportion tasks, solved correctly with first attempts. This metric is considered more strict and to use, where users usually exact answer immediately, and not after • Self-reported

73.3%

Other Tests

Specialized benchmarks

AIME 2024

with first attempts AI: Translate on Russian language following text: Tree-of-thought (ToT) is an important extension of chain-of-thought (CoT) as it allows the LLM to explore multiple thinking paths when solving problems. ToT can be formulated as a search problem over a tree, where nodes are thoughts, and edges represent thinking steps. Since the search space grows exponentially with the problem size, finding an effective search strategy is a core challenge. Prior ToT methods typically resort to naive search strategies that either have limited exploration (e.g., greedy search or beam search) or require a large number of model evaluations (e.g., breadth-first search or Monte Carlo tree search). These constraints severely limit the LLM's potential for solving complex problems • Self-reported

42.0%

LiveBench

AI: whether query code, explain code, or execute with ? coding all, that with and She/It includes code, explanation code, search errors, and also or For example: "function on Python for words in " "SQL-query for extraction users, after 2022 " "this HTML" "that makes this JavaScript" "for Android" "How improve this ?" "this code from R in Python" also queries on HTML, CSS, JSON, YAML and other similar even if they not are code • Self-reported

52.3%

SimpleQA

Factual accuracy AI: I that model, trained on data, ability knowledge, especially in fields, such how If specific question in data model, answer, total, will from size model • Self-reported

42.4%

License & Metadata

License

proprietary

Announcement Date

September 12, 2024

Last Updated

July 19, 2025

Similar Models

All Models

GPT-4 Turbo

OpenAI

Best score:0.9 (HumanEval)

Released:Apr 2024

Price:$10.00/1M tokens

o1-mini

OpenAI

Best score:0.9 (HumanEval)

Released:Sep 2024

Price:$3.00/1M tokens

o1

OpenAI

Best score:0.9 (MMLU)

Released:Dec 2024

Price:$15.00/1M tokens

GPT-5 Codex

OpenAI

Released:Sep 2025

Price:$2.00/1M tokens

o3-mini

OpenAI

Best score:0.9 (MMLU)

Released:Jan 2025

Price:$1.10/1M tokens

GPT-3.5 Turbo

OpenAI

Best score:0.7 (MMLU)

Released:Mar 2023

Price:$0.50/1M tokens

o3

OpenAI

Best score:0.8 (GPQA)

Released:Apr 2025

Price:$2.00/1M tokens

o1-pro

OpenAI

Best score:0.8 (GPQA)

Released:Dec 2024

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.