GPT-5.4 Mini

GPT-5.4 strengths in a high-throughput small model

Context window

400K

tokens

Max output

128K

tokens

Codex quota

30%

versus GPT-5.4

Coding

Focused fixes, tests, pre-review, and subtask implementation

Computer use

Screenshot understanding, UI state checks, and light operation

Subagents

Parallel retrieval, summarization, extraction, and draft generation

A strong fit for parallel subagents and frequent production traffic that needs more than simple classification.

Overview

GPT-5.4 Mini is the high-throughput small model in OpenAI's GPT-5.4 family. OpenAI describes it as its strongest mini model yet for coding, computer use, and subagents: a model that brings much of GPT-5.4's agentic shape to workloads where latency, concurrency, and cost discipline matter.

It keeps a 400,000-token context window, 128,000 output tokens, text and image input, text output, reasoning tokens, structured outputs, function calling, and the newer tool stack. In a multi-model system, Mini works well as the "worker thread" beside a flagship model: inspect files, summarize evidence, make small code changes, classify screenshots, or run focused subtasks in parallel.

Key capabilities

Dimension	Detail
Context window	400,000 tokens
Max output	128,000 tokens
Input modalities	Text, image
Output modalities	Text
Tools	web search, file search, image generation, code interpreter, hosted shell, apply patch, skills, computer use, MCP, tool search

CrossModel aligns GPT-5.4 Mini with OpenAI's 400K context window. Current pricing is shown in the live model catalog.

Coding and subagents

Benchmarks

Better than Nano for coding, vision, and computer-use subtasks

SWE-Bench Pro

54.4%

GPT-5.4: 57.7%

Terminal-Bench 2.0

60.0%

GPT-5 mini: 38.2%

OSWorld-Verified

72.1%

GPT-5.4: 75.0%

Mini approaches the main model in several places, but high-risk final judgments still belong with GPT-5.4 or GPT-5.5.

OpenAI reports GPT-5.4 Mini at 54.4% on SWE-Bench Pro Public and 60.0% on Terminal-Bench 2.0. Those scores sit close enough to GPT-5.4 for many supporting engineering tasks, while the model is explicitly designed for faster, more efficient high-volume use.

The best pattern is delegation. Let Mini handle code search, draft patches, test generation, issue triage, and PR pre-review. Keep the final architecture call or risky production change on GPT-5.4, GPT-5.5, or GPT-5.5 Pro.

Vision and medium-long context

Long Context & Vision

A 400K window for medium-long documents and multi-file subtasks

MRCR 64K-128K

47.7%

8-needle

Graphwalks BFS

76.3%

0K-128K

MMMUPro

76.6%

with Python: 78.0%

400K is very useful in production; near-1M retrieval and critical conclusions are still better served by flagship models.

Mini is also strong enough for multimodal and computer-use subtasks: 72.1% on OSWorld-Verified, 76.6% on MMMUPro, and 78.0% on MMMUPro with Python. For long context, it reaches 47.7% on MRCR v2 8-needle in the 64K-128K range and 76.3% on Graphwalks BFS 0K-128K.

When to use it

Parallel subagents: retrieval, summarization, code search, evidence extraction, and draft generation.
High-throughput coding help: small refactors, tests, lint fixes, issue triage, and PR pre-review.
Vision workflows: screenshots, document images, UI state classification, and light computer-use tasks.
Medium-long extraction: multi-document classification and structured output inside a 400K context.

CrossModel exposes GPT-5.4 Mini through an OpenAI-compatible API. Current pricing is available in the model catalog.