CrossModel
Back to model catalog

Gemini · Model guide

Gemini 3 Flash Preview

gemini/gemini-3-flash-preview
Modalities
TextImageAudioVideoText
Context
1M
Max output
66K
Gemini 3 Flash Preview

Pro-like reasoning at the Flash tier’s latency and cost

Context window
1M
1,048,576 tokens
Max output
64K
65,536 tokens
Thinking
Adaptive
Auto effort scaling
High-frequency low-latency
Support, RAG, realtime summaries, online flows
Scaled agents
Batch tool calls and code tasks where throughput matters
Multimodal
Quick parsing of images, charts, documents with structured output

The first Flash-tier release in the Gemini 3 family; adaptive thinking spends more effort on complex requests.

Overview

Gemini 3 Flash is Google DeepMind's fast frontier model, released on December 17, 2025. It brings Pro-like reasoning into the Flash tier's latency, efficiency, and cost profile, and is the first Flash-tier release in the Gemini 3 family. CrossModel exposes the preview SKU as gemini/gemini-3-flash-preview.

Compared with Gemini 2.5 Pro, Gemini 3 Flash is both smarter and significantly faster. Artificial Analysis measurements cited in the original article put it at roughly the speed and about 30% fewer tokens on everyday tasks. It uses adaptive thinking, automatically spending more effort on complex requests.

Key capabilities

DimensionDetail
Context window1,048,576 tokens (about 1M)
Max output65,536 tokens (about 64K)
Input modalitiesText, image (Google's native model also supports audio and video)
Output modalitiesText
Toolsfunction calling, structured outputs, streaming, adaptive thinking, multi-step tool use

Flash-tier models use a single pricing tier without a long-context surcharge, which suits scaled, frequent, low-latency calls. See live rates in the model catalog.

Benchmarks

For a Flash-tier model, the headline is that frontier-grade scores arrive together with much lower latency and token usage.

Frontier Benchmarks

Frontier scores rare at the Flash tier

GPQA Diamond
90.4%
MMMU-Pro
81.2%
SWE-bench Verified
78%
Humanity's Last Exam
33.7%
No tools

HLE is without tools; GPQA Diamond reaches graduate-level science reasoning.

Google's release data reports GPQA Diamond 90.4% at the frontier of graduate-level science reasoning, MMMU-Pro 81.2% for multimodal understanding, and SWE-bench Verified 78% — agentic coding rarely seen at the Flash tier. The no-tools Humanity's Last Exam 33.7% shows it keeps real footing on hard academic questions.

Speed and efficiency

Speed & Efficiency

Comparable intelligence at lower cost and latency

Speed
≈3×
vs 2.5 Pro
Token usage
-30%
Everyday tasks, avg
Intelligence
≈Pro
Comparable on many tasks

Baseline is Gemini 2.5 Pro; figures from Artificial Analysis speed measurements.

What ultimately defines its value is the lower row: those scores arrive with about 3× the speed of 2.5 Pro and roughly 30% fewer tokens on average. The same level of intelligence at much lower response time and better unit economics is exactly what the Flash tier is for.

When to use it

  • High-frequency low-latency services: support, RAG, realtime summaries, and online workflows.
  • Scaled agents: batch tool calls and code tasks where throughput matters.
  • Multimodal understanding: images, charts, and documents that need quick parsing.
  • Cost-sensitive reasoning: near-Pro quality when budget or latency is constrained.

CrossModel exposes Gemini 3 Flash through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.