Gemini 3 Flash Preview

Pro-like reasoning at the Flash tier’s latency and cost

Context window

1,048,576 tokens

Max output

64K

65,536 tokens

Thinking

Adaptive

Auto effort scaling

High-frequency low-latency

Support, RAG, realtime summaries, online flows

Scaled agents

Batch tool calls and code tasks where throughput matters

Multimodal

Quick parsing of images, charts, documents with structured output

The first Flash-tier release in the Gemini 3 family; adaptive thinking spends more effort on complex requests.

Overview

Gemini 3 Flash is Google DeepMind's fast frontier model, released on December 17, 2025. It brings Pro-like reasoning into the Flash tier's latency, efficiency, and cost profile, and is the first Flash-tier release in the Gemini 3 family. CrossModel exposes the preview SKU as gemini/gemini-3-flash-preview.

Compared with Gemini 2.5 Pro, Gemini 3 Flash is both smarter and significantly faster. Artificial Analysis measurements cited in the original article put it at roughly 3× the speed and about 30% fewer tokens on everyday tasks. It uses adaptive thinking, automatically spending more effort on complex requests.

Key capabilities

Dimension	Detail
Context window	1,048,576 tokens (about 1M)
Max output	65,536 tokens (about 64K)
Input modalities	Text, image (Google's native model also supports audio and video)
Output modalities	Text
Tools	function calling, structured outputs, streaming, adaptive thinking, multi-step tool use

Flash-tier models use a single pricing tier without a long-context surcharge, which suits scaled, frequent, low-latency calls. See live rates in the model catalog.

Benchmarks

For a Flash-tier model, the headline is that frontier-grade scores arrive together with much lower latency and token usage.

Frontier Benchmarks

Frontier scores rare at the Flash tier

GPQA Diamond

90.4%

MMMU-Pro

81.2%

SWE-bench Verified

78%

Humanity's Last Exam

33.7%

No tools

HLE is without tools; GPQA Diamond reaches graduate-level science reasoning.

Google's release data reports GPQA Diamond 90.4% at the frontier of graduate-level science reasoning, MMMU-Pro 81.2% for multimodal understanding, and SWE-bench Verified 78% — agentic coding rarely seen at the Flash tier. The no-tools Humanity's Last Exam 33.7% shows it keeps real footing on hard academic questions.

Speed and efficiency

Speed & Efficiency

Comparable intelligence at lower cost and latency

Speed

≈3×

vs 2.5 Pro

Token usage

-30%

Everyday tasks, avg

Intelligence

≈Pro

Comparable on many tasks

Baseline is Gemini 2.5 Pro; figures from Artificial Analysis speed measurements.

What ultimately defines its value is the lower row: those scores arrive with about 3× the speed of 2.5 Pro and roughly 30% fewer tokens on average. The same level of intelligence at much lower response time and better unit economics is exactly what the Flash tier is for.

When to use it

High-frequency low-latency services: support, RAG, realtime summaries, and online workflows.
Scaled agents: batch tool calls and code tasks where throughput matters.
Multimodal understanding: images, charts, and documents that need quick parsing.
Cost-sensitive reasoning: near-Pro quality when budget or latency is constrained.

CrossModel exposes Gemini 3 Flash through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.