GPT-5.1 Codex High vs Step-3.5-Flash: Specs & Benchmark Comparison

Characteristic	GPT-5.1 Codex High	Step-3.5-Flash
Company	OpenAI	StepFun
Release Date	November 11, 2025	February 1, 2026
Parameters	—	196B
Multimodal	Yes	Yes
Context (input)	400K	66K
Context (output)	128K	8K
Input Price / 1M	$1.25	$0.10
Output Price / 1M	$10.00	$0.40
Average Score	1.0	0.8
Benchmarks
AIME 2025	1.0	1.0

Visual Benchmark Comparison

GPT-5.1 Codex High

Step-3.5-Flash

AIME 20251.0 vs 1.0

1.0

Verdict

Step-3.5-Flash leads in 2 out of 4 comparison categories.

Overall Performance

Both models show comparable average scores: GPT-5.1 Codex High — 1.0, Step-3.5-Flash — 0.8.

API Cost

Step-3.5-Flash is 22.5x cheaper: input $0.10/1M vs $1.25/1M tokens.

Context Window

GPT-5.1 Codex High supports a larger context: 400K vs 66K tokens.

Recency

Step-3.5-Flash is newer: released 2/1/2026 vs 11/11/2025.

More About These Models

GPT-5.1 Codex High

OpenAI — specs, benchmarks, API

Step-3.5-Flash

StepFun — specs, benchmarks, API

Related Comparisons

GPT-5.1 Codex High vs GPT-5.2 GPT-5.1 Codex High vs GPT-5.1 Medium GPT-5.1 Codex High vs GPT-5.1 High GPT-5.1 Codex High vs GPT-5.4 GPT-5 High vs GPT-5.1 Codex High GPT-5.1 Codex High vs GPT-5.1 Thinking

All model comparisons →

Frequently Asked Questions

Which is better for coding — GPT-5.1 Codex High or Step-3.5-Flash?

Direct comparison on the SWE-Bench benchmark is not available. We recommend reviewing other metrics on the comparison page.

Which model is cheaper — GPT-5.1 Codex High or Step-3.5-Flash?

Step-3.5-Flash is cheaper for input: $0.10 per 1M tokens vs $1.25.

Which has a larger context window — GPT-5.1 Codex High or Step-3.5-Flash?

GPT-5.1 Codex High supports a larger context: 400,000 tokens vs 65,536.

The GPT-5.1 Codex High and Step-3.5-Flash comparison is updated for 2026. Data includes benchmark results, API pricing, context window size and other specifications. For more detailed information, visit the GPT-5.1 Codex High or Step-3.5-Flash page. See also the complete list of AI model comparisons.