GPT-5.5

GPT-5.5

Overview

GPT-5.5 is OpenAI's flagship model released on April 23, 2026. It is aimed at real computer work rather than single-turn question answering: code, research, data analysis, documents, spreadsheets, browsers, and file systems can all live in the same workflow while the model keeps a long-running goal in view.

OpenAI's release material emphasizes better task completion at latency close to GPT-5.4, and more efficient token use on Codex tasks. In the API, gpt-5.5 is the default flagship; gpt-5.5-pro is the slower, higher-compute sibling for the hardest work.

Key capabilities

Dimension	Detail
Context window	1,050,000 tokens (about 1.05M)
Max output	128,000 tokens
Input modalities	Text, image
Output modalities	Text
Tools	function calling, structured outputs, streaming, web search, file search, image generation, code interpreter, hosted shell, apply patch, computer use, MCP

Inputs above 272K tokens enter a higher long-context tier (2x input and 1.5x output), covering standard, batch, and flex modes. See live pricing in the model catalog.

Benchmarks

GPT-5.5's evaluation story spans long-context retrieval, agentic coding, computer and knowledge work, security, and factual reliability. The theme is not winning every isolated benchmark; it is finishing long tasks more reliably than GPT-5.4.

Long-context retrieval

Long Context

Recovers key facts near the 1M-token range

MRCR v2 8-needle (512K-1M)

74.0%

GPT-5.4: 36.6%

Graphwalks BFS 1M

45.4%

F1 · GPT-5.4: 9.4%

Context window

1.05M

tokens

Compared with GPT-5.4 on the same axis, GPT-5.5 is not just larger-context; it is much more reliable from 512K to 1M.

On OpenAI MRCR v2 8-needle in the 512K-1M range, GPT-5.5 rises from GPT-5.4's 36.6% to 74.0%. Graphwalks BFS 1M F1 improves from 9.4% to 45.4%. This matters for large repos, long contracts, research packets, and multi-file tasks: the model does not just fit the context, it retrieves from it more reliably.

Coding and terminal work

Coding & Terminal

Agentic coding scores closer to real engineering

Terminal-Bench 2.0

82.7%

GPT-5.4: 75.1%

SWE-Bench Pro

58.6%

Public

Expert-SWE

73.1%

These tasks require reading the environment, running commands, editing code, observing failures, and iterating.

OpenAI called GPT-5.5 its strongest agentic coding model at release. It reaches 82.7% on Terminal-Bench 2.0, ahead of GPT-5.4 at 75.1%; 58.6% on SWE-Bench Pro Public; and 73.1% on Expert-SWE. These tests resemble real engineering loops: read the environment, run commands, edit, observe failure, and fix again.

Computer use, knowledge work, security, and reliability

Computer Use & Knowledge Work

Strength beyond code: documents, spreadsheets, and financial models

OSWorld-Verified

78.7%

GPT-5.4: 75.0%

GDPval

84.9%

wins or ties

IB Modeling

88.5%

investment-banking modeling

OfficeQA Pro

54.1%

From browser and desktop tasks to investment-banking modeling and office QA, GPT-5.5 is tuned for longer deliverable workflows.

OSWorld-Verified reaches 78.7%, above GPT-5.4's 75.0%. GDPval wins or ties is 84.9%, investment-banking modeling is 88.5%, and OfficeQA Pro is 54.1%.

Security

High-capability tier for vulnerability discovery and defensive work

CVE-Bench

93.1%

pass@1

Capture-the-Flags

88.1%

challenge tasks

CyberGym

81.8%

OpenAI treats cybersecurity capability as High capability, so production use should pair it with trusted access, logs, and permission boundaries.

Security evaluations include CVE-Bench 93.1% pass@1, Capture-the-Flags 88.1%, and CyberGym 81.8%.

Factual Reliability

When to use it

Long-context engineering: codebases, test output, design docs, and issues in one window.
Agentic workflows: browser, terminal, file search, code interpreter, and MCP together.
Professional knowledge work: financial models, research, contracts, reports, and decks.
Security and defensive work: vulnerability analysis, code review, and configuration checks with proper permissions and audit.

CrossModel exposes GPT-5.5 through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.