OpenAI logo

o1-preview

OpenAI

A research preview model focused on mathematical and logical reasoning abilities, demonstrating improved performance on tasks requiring step-by-step reasoning, mathematical problem-solving, and code generation. The model shows extended formal reasoning capabilities while maintaining strong general abilities.

Key Specifications

Parameters
-
Context
128.0K
Release Date
September 12, 2024
Average Score
64.8%

Timeline

Key dates in the model's history
Announcement
September 12, 2024
Last Update
July 19, 2025
Today
March 25, 2026

Technical Specifications

Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval

Pricing & Availability

Input (per 1M tokens)
$15.00
Output (per 1M tokens)
$60.00
Max Input Tokens
128.0K
Max Output Tokens
32.8K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning

Benchmark Results

Model performance metrics across various tests and benchmarks

General Knowledge

Tests on general knowledge and understanding
MMLU
When execution tasks pass@1 only one answer on each from tasks. This standard with which we in use models AI. When model answer on question we usually we receive answer. If at model no capabilities obtain more information or manner quality first answer has value. In our analysis pass@1 measures proportion tasks, for which first and answer model correctSelf-reported
90.8%

Programming

Programming skills tests
SWE-Bench Verified
# Process In order to verify, indeed whether GPT-4 Turbo can ensure accuracy answers, we used method from work by Verified-responses. We developed process: 1. **answer**: We task and we ask GPT-4 Turbo give answer with for him level 2. **Query **: We we ask model provide justification its answer, her/its thoroughly verify its work. 3. ****: We we ask model still times verify justification and whether she/it in answer. 4. **Query level **: We we ask model evaluate its confidence by scale from 1 to 5. 5. **capability "not "**: We model "not ", if she/it not fully in its answer. We we consider answer "", only if model level (5/5), without that in its answer, and provides justification. If on process model answer not is considered This process encourages model to and gives it set capabilities that probability errorsSelf-reported
41.3%

Mathematics

Mathematical problems and computations
MATH
pass@1 This score, probability that, that model correct answer with first attempts. He measures proportion tasks, which model solves correctly at without capabilities or For computation this metrics one solution model for each tasks and is determined proportion tasks, in which model correct answer. This strict measure, since not no/none attempts improvements or corrections answer. This score especially important in contexts, where is required exact answer, and no capabilities for repeated attempts or thinkingSelf-reported
85.5%
MGSM
pass@1 This method evaluates, probability that, that model correct answer with first attempts. He based on concepts, that model can give different answers at various (then is at different ). When use pass@1, we first several answers model on one and indeed task with different and then we evaluate probability correct answer with first attempts. that we we use model for solutions tasks coding. We could would generate several solutions for each tasks (for example, 100 solutions at various ), and then evaluate, which percentage from them correct. This gives us probability that, that model correct answer with first attempts. Such approach ensures more evaluation capabilities model, than simply verification one answer, since accounts for modelsSelf-reported
90.8%

Reasoning

Logical reasoning and analysis
GPQA
Pass@1 - this metric, used for evaluation efficiency model at solving tasks for one pass. In difference from method and errors, which allows model do several attempts, pass@1 measures ability model give correct solution with first attempts. This metric especially important for evaluation abilities models solve complex mathematical and logical tasks, where first solution should be High score pass@1 indicates on then, that model understanding problems and can effectively apply corresponding knowledge and strategies solutions without necessity in approach. When evaluation pass@1 model receives only one attempt for solutions each tasks, and result is determined how proportion tasks, solved correctly with first attempts. This metric is considered more strict and to use, where users usually exact answer immediately, and not afterSelf-reported
73.3%

Other Tests

Specialized benchmarks
AIME 2024
with first attempts AI: Translate on Russian language following text: Tree-of-thought (ToT) is an important extension of chain-of-thought (CoT) as it allows the LLM to explore multiple thinking paths when solving problems. ToT can be formulated as a search problem over a tree, where nodes are thoughts, and edges represent thinking steps. Since the search space grows exponentially with the problem size, finding an effective search strategy is a core challenge. Prior ToT methods typically resort to naive search strategies that either have limited exploration (e.g., greedy search or beam search) or require a large number of model evaluations (e.g., breadth-first search or Monte Carlo tree search). These constraints severely limit the LLM's potential for solving complex problemsSelf-reported
42.0%
LiveBench
AI: whether query code, explain code, or execute with ? coding all, that with and She/It includes code, explanation code, search errors, and also or For example: "function on Python for words in " "SQL-query for extraction users, after 2022 " "this HTML" "that makes this JavaScript" "for Android" "How improve this ?" "this code from R in Python" also queries on HTML, CSS, JSON, YAML and other similar even if they not are codeSelf-reported
52.3%
SimpleQA
Factual accuracy AI: I that model, trained on data, ability knowledge, especially in fields, such how If specific question in data model, answer, total, will from size modelSelf-reported
42.4%

License & Metadata

License
proprietary
Announcement Date
September 12, 2024
Last Updated
July 19, 2025

Similar Models

All Models

Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.