o1-preview
A research preview model focused on mathematical and logical reasoning abilities, demonstrating improved performance on tasks requiring step-by-step reasoning, mathematical problem-solving, and code generation. The model shows extended formal reasoning capabilities while maintaining strong general abilities.
Key Specifications
Parameters
-
Context
128.0K
Release Date
September 12, 2024
Average Score
64.8%
Timeline
Key dates in the model's history
Announcement
September 12, 2024
Last Update
July 19, 2025
Today
March 25, 2026
Technical Specifications
Parameters
-
Training Tokens
-
Knowledge Cutoff
-
Family
-
Capabilities
MultimodalZeroEval
Pricing & Availability
Input (per 1M tokens)
$15.00
Output (per 1M tokens)
$60.00
Max Input Tokens
128.0K
Max Output Tokens
32.8K
Supported Features
Function CallingStructured OutputCode ExecutionWeb SearchBatch InferenceFine-tuning
Benchmark Results
Model performance metrics across various tests and benchmarks
General Knowledge
Tests on general knowledge and understanding
MMLU
When execution tasks pass@1 only one answer on each from tasks. This standard with which we in use models AI. When model answer on question we usually we receive answer. If at model no capabilities obtain more information or manner quality first answer has value. In our analysis pass@1 measures proportion tasks, for which first and answer model correct • Self-reported
Programming
Programming skills tests
SWE-Bench Verified
# Process In order to verify, indeed whether GPT-4 Turbo can ensure accuracy answers, we used method from work by Verified-responses. We developed process: 1. **answer**: We task and we ask GPT-4 Turbo give answer with for him level 2. **Query **: We we ask model provide justification its answer, her/its thoroughly verify its work. 3. ****: We we ask model still times verify justification and whether she/it in answer. 4. **Query level **: We we ask model evaluate its confidence by scale from 1 to 5. 5. **capability "not "**: We model "not ", if she/it not fully in its answer. We we consider answer "", only if model level (5/5), without that in its answer, and provides justification. If on process model answer not is considered This process encourages model to and gives it set capabilities that probability errors • Self-reported
Mathematics
Mathematical problems and computations
MATH
pass@1 This score, probability that, that model correct answer with first attempts. He measures proportion tasks, which model solves correctly at without capabilities or For computation this metrics one solution model for each tasks and is determined proportion tasks, in which model correct answer. This strict measure, since not no/none attempts improvements or corrections answer. This score especially important in contexts, where is required exact answer, and no capabilities for repeated attempts or thinking • Self-reported
MGSM
pass@1 This method evaluates, probability that, that model correct answer with first attempts. He based on concepts, that model can give different answers at various (then is at different ). When use pass@1, we first several answers model on one and indeed task with different and then we evaluate probability correct answer with first attempts. that we we use model for solutions tasks coding. We could would generate several solutions for each tasks (for example, 100 solutions at various ), and then evaluate, which percentage from them correct. This gives us probability that, that model correct answer with first attempts. Such approach ensures more evaluation capabilities model, than simply verification one answer, since accounts for models • Self-reported
Reasoning
Logical reasoning and analysis
GPQA
Pass@1 - this metric, used for evaluation efficiency model at solving tasks for one pass. In difference from method and errors, which allows model do several attempts, pass@1 measures ability model give correct solution with first attempts. This metric especially important for evaluation abilities models solve complex mathematical and logical tasks, where first solution should be High score pass@1 indicates on then, that model understanding problems and can effectively apply corresponding knowledge and strategies solutions without necessity in approach. When evaluation pass@1 model receives only one attempt for solutions each tasks, and result is determined how proportion tasks, solved correctly with first attempts. This metric is considered more strict and to use, where users usually exact answer immediately, and not after • Self-reported
Other Tests
Specialized benchmarks
AIME 2024
with first attempts AI: Translate on Russian language following text: Tree-of-thought (ToT) is an important extension of chain-of-thought (CoT) as it allows the LLM to explore multiple thinking paths when solving problems. ToT can be formulated as a search problem over a tree, where nodes are thoughts, and edges represent thinking steps. Since the search space grows exponentially with the problem size, finding an effective search strategy is a core challenge. Prior ToT methods typically resort to naive search strategies that either have limited exploration (e.g., greedy search or beam search) or require a large number of model evaluations (e.g., breadth-first search or Monte Carlo tree search). These constraints severely limit the LLM's potential for solving complex problems • Self-reported
LiveBench
AI: whether query code, explain code, or execute with ? coding all, that with and She/It includes code, explanation code, search errors, and also or For example: "function on Python for words in " "SQL-query for extraction users, after 2022 " "this HTML" "that makes this JavaScript" "for Android" "How improve this ?" "this code from R in Python" also queries on HTML, CSS, JSON, YAML and other similar even if they not are code • Self-reported
SimpleQA
Factual accuracy AI: I that model, trained on data, ability knowledge, especially in fields, such how If specific question in data model, answer, total, will from size model • Self-reported
License & Metadata
License
proprietary
Announcement Date
September 12, 2024
Last Updated
July 19, 2025
Similar Models
All ModelsGPT-4 Turbo
OpenAI
Best score:0.9 (HumanEval)
Released:Apr 2024
Price:$10.00/1M tokens
o1-mini
OpenAI
Best score:0.9 (HumanEval)
Released:Sep 2024
Price:$3.00/1M tokens
o1
OpenAI
Best score:0.9 (MMLU)
Released:Dec 2024
Price:$15.00/1M tokens
GPT-5 Codex
OpenAI
Released:Sep 2025
Price:$2.00/1M tokens
o3-mini
OpenAI
Best score:0.9 (MMLU)
Released:Jan 2025
Price:$1.10/1M tokens
GPT-3.5 Turbo
OpenAI
Best score:0.7 (MMLU)
Released:Mar 2023
Price:$0.50/1M tokens
o3
OpenAI
MM
Best score:0.8 (GPQA)
Released:Apr 2025
Price:$2.00/1M tokens
o1-pro
OpenAI
MM
Best score:0.8 (GPQA)
Released:Dec 2024
Recommendations are based on similarity of characteristics: developer organization, multimodality, parameter size, and benchmark performance. Choose a model to compare or go to the full catalog to browse all available AI models.