Pro-like reasoning at the Flash tier’s latency and cost
The first Flash-tier release in the Gemini 3 family; adaptive thinking spends more effort on complex requests.
Overview
Gemini 3 Flash is Google DeepMind's fast frontier model, released on December 17, 2025. It brings Pro-like reasoning into the Flash tier's latency, efficiency, and cost profile, and is the first Flash-tier release in the Gemini 3 family. CrossModel exposes the preview SKU as gemini/gemini-3-flash-preview.
Compared with Gemini 2.5 Pro, Gemini 3 Flash is both smarter and significantly faster. Artificial Analysis measurements cited in the original article put it at roughly 3× the speed and about 30% fewer tokens on everyday tasks. It uses adaptive thinking, automatically spending more effort on complex requests.
Key capabilities
| Dimension | Detail |
|---|---|
| Context window | 1,048,576 tokens (about 1M) |
| Max output | 65,536 tokens (about 64K) |
| Input modalities | Text, image (Google's native model also supports audio and video) |
| Output modalities | Text |
| Tools | function calling, structured outputs, streaming, adaptive thinking, multi-step tool use |
Flash-tier models use a single pricing tier without a long-context surcharge, which suits scaled, frequent, low-latency calls. See live rates in the model catalog.
Benchmarks
For a Flash-tier model, the headline is that frontier-grade scores arrive together with much lower latency and token usage.
Frontier scores rare at the Flash tier
HLE is without tools; GPQA Diamond reaches graduate-level science reasoning.
Google's release data reports GPQA Diamond 90.4% at the frontier of graduate-level science reasoning, MMMU-Pro 81.2% for multimodal understanding, and SWE-bench Verified 78% — agentic coding rarely seen at the Flash tier. The no-tools Humanity's Last Exam 33.7% shows it keeps real footing on hard academic questions.
Speed and efficiency
Comparable intelligence at lower cost and latency
Baseline is Gemini 2.5 Pro; figures from Artificial Analysis speed measurements.
What ultimately defines its value is the lower row: those scores arrive with about 3× the speed of 2.5 Pro and roughly 30% fewer tokens on average. The same level of intelligence at much lower response time and better unit economics is exactly what the Flash tier is for.
When to use it
- High-frequency low-latency services: support, RAG, realtime summaries, and online workflows.
- Scaled agents: batch tool calls and code tasks where throughput matters.
- Multimodal understanding: images, charts, and documents that need quick parsing.
- Cost-sensitive reasoning: near-Pro quality when budget or latency is constrained.
CrossModel exposes Gemini 3 Flash through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.