CrossModel
Back to model catalog

DeepSeek · Model guide

DeepSeek V4 Flash

deepseek/deepseek-v4-flash
Modalities
TextText
Context
1M
Max output
65K

DeepSeek V4 Flash spec overview

Overview

DeepSeek V4 Flash is the efficient member of the V4 Preview family, released with V4 Pro on April 24, 2026. DeepSeek lists it at 284B total / 13B active parameters and describes it as the fast, efficient, economical option while keeping the full V4 feature set: 1M context, Thinking / Non-Thinking modes, and OpenAI / Anthropic-compatible APIs.

If V4 Pro is the high-capability tier for difficult reasoning and agent work, V4 Flash is the default entry point: chat, batch processing, retrieval preprocessing, structured extraction, simple agent steps, and verifiable code tasks can start on Flash, with only harder cases escalated to Pro. It shares Pro's API surface, so switching usually means changing only the model value in the request.

Key capabilities

DimensionDetail
Context window1,000,000 tokens
Max output65,000 tokens (CrossModel configuration)
Input modalitiesText
Output modalitiesText
Architecture284B total / 13B active MoE
Toolsstreaming, JSON output, tool calls, Thinking / Non-Thinking

Both V4 Flash and V4 Pro support 1M context, JSON Output, Tool Calls, and Chat Prefix Completion; FIM Completion is limited to Non-Thinking mode. 1M context is the default capability across all V4 services, not a special add-on. See live pricing in the model catalog.

Benchmarks and workflow role

V4 Flash is not meant to replace Pro on every hard task. Its value is lowering latency and cost across the many steps that can be decomposed, checked, and upgraded. DeepSeek's technical report shows V4 Flash Max landing close to Pro on several reasoning and agent benchmarks.

Workflow

Default entry, escalate the hard cases

GPQA Diamond
88.1
LiveCodeBench
91.6
SWE Verified
79.0
MRCR 1M
78.7
01
Inbound requests
Docs, code, retrieval preprocessing
02
Flash handles
Extraction · summary · classification · drafts
03
Confidence gate
Low-confidence or hard samples
04
Escalate to V4 Pro
Hand off final reasoning to the flagship tier

Flash Max already lands close to Pro on several reasoning and agent benchmarks: keep routine steps on Flash and pass only low-confidence samples to V4 Pro.

Flash Max is already strong on GPQA Diamond 88.1, LiveCodeBench 91.6, SWE Verified 79.0, and MRCR 1M 78.7. It trails Pro on SimpleQA Verified, Terminal Bench 2.0, BrowseComp, and other harder agent tasks — which maps cleanly onto a natural tiering: a typical agent stack places Flash in the first layer for retrieval, summarization, classification, field extraction, and candidate generation, then passes low-confidence or reasoning-heavy samples to V4 Pro. The MRCR 1M 78.7 score also shows Flash is usable for million-token long-context retrieval, so it can take on large-document and codebase work directly.

When to use it

  • High-volume chat and support: low latency and lower cost for default user-facing interactions.
  • Batch content processing: long-context JSON and tool workflows for extraction, rewriting, and cleaning.
  • Agent default tier: run routine retrieval, planning, and code edits on Flash before escalating.
  • Long-document Q&A: put documents, logs, or codebases directly into the prompt window.

CrossModel exposes DeepSeek V4 Flash through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.