DeepSeek V4 Flash · Model guide

DeepSeek V4 Flash spec overview

Overview

DeepSeek V4 Flash is the efficient member of the V4 Preview family, released with V4 Pro on April 24, 2026. DeepSeek lists it at 284B total / 13B active parameters and describes it as the fast, efficient, economical option while keeping the full V4 feature set: 1M context, Thinking / Non-Thinking modes, and OpenAI / Anthropic-compatible APIs.

If V4 Pro is the high-capability tier for difficult reasoning and agent work, V4 Flash is the default entry point: chat, batch processing, retrieval preprocessing, structured extraction, simple agent steps, and verifiable code tasks can start on Flash, with only harder cases escalated to Pro. It shares Pro's API surface, so switching usually means changing only the model value in the request.

Key capabilities

Dimension	Detail
Context window	1,000,000 tokens
Max output	65,000 tokens (CrossModel configuration)
Input modalities	Text
Output modalities	Text
Architecture	284B total / 13B active MoE
Tools	streaming, JSON output, tool calls, Thinking / Non-Thinking

Both V4 Flash and V4 Pro support 1M context, JSON Output, Tool Calls, and Chat Prefix Completion; FIM Completion is limited to Non-Thinking mode. 1M context is the default capability across all V4 services, not a special add-on. See live pricing in the model catalog.

Benchmarks and workflow role

V4 Flash is not meant to replace Pro on every hard task. Its value is lowering latency and cost across the many steps that can be decomposed, checked, and upgraded. DeepSeek's technical report shows V4 Flash Max landing close to Pro on several reasoning and agent benchmarks.

Workflow

Default entry, escalate the hard cases

GPQA Diamond

88.1

LiveCodeBench

91.6

SWE Verified

79.0

MRCR 1M

78.7

Inbound requests

Docs, code, retrieval preprocessing

Flash handles

Extraction · summary · classification · drafts

Confidence gate

Low-confidence or hard samples

Escalate to V4 Pro

Hand off final reasoning to the flagship tier

Flash Max already lands close to Pro on several reasoning and agent benchmarks: keep routine steps on Flash and pass only low-confidence samples to V4 Pro.

Flash Max is already strong on GPQA Diamond 88.1, LiveCodeBench 91.6, SWE Verified 79.0, and MRCR 1M 78.7. It trails Pro on SimpleQA Verified, Terminal Bench 2.0, BrowseComp, and other harder agent tasks — which maps cleanly onto a natural tiering: a typical agent stack places Flash in the first layer for retrieval, summarization, classification, field extraction, and candidate generation, then passes low-confidence or reasoning-heavy samples to V4 Pro. The MRCR 1M 78.7 score also shows Flash is usable for million-token long-context retrieval, so it can take on large-document and codebase work directly.

When to use it

High-volume chat and support: low latency and lower cost for default user-facing interactions.
Batch content processing: long-context JSON and tool workflows for extraction, rewriting, and cleaning.
Agent default tier: run routine retrieval, planning, and code edits on Flash before escalating.
Long-document Q&A: put documents, logs, or codebases directly into the prompt window.

CrossModel exposes DeepSeek V4 Flash through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.