A fast, efficient frontier model that approaches Pro tier
Adjustable thinking levels: low effort for simple requests, higher effort for complex reasoning.
Overview
Gemini 3.5 Flash is Google DeepMind's fast, efficient frontier model, released on May 19, 2026. Built on Gemini 3 Flash, it adds adjustable thinking levels so developers can trade quality, cost, and latency per task: low effort for simple requests, higher effort for complex reasoning.
Its positioning is unusual for a Flash model. On agentic coding and multi-step tool use, 3.5 Flash approaches Pro-tier results in several benchmarks while keeping the speed and cost profile expected from Flash. CrossModel exposes it as gemini/gemini-3.5-flash.
Key capabilities
| Dimension | Detail |
|---|---|
| Context window | 1,048,576 tokens (about 1M) |
| Max output | 65,536 tokens (about 64K) |
| Input modalities | Text, image (Google's native model also supports audio and video) |
| Output modalities | Text |
| Tools | function calling, structured outputs, streaming, adjustable thinking levels, multi-step tool use (MCP) |
Flash-tier models use a single pricing tier without a long-context surcharge, which suits frequent, low-latency, cost-sensitive workloads. See live rates in the model catalog.
Benchmarks
The core evaluation story is agentic engineering and tool use: keep latency low while making repeated tool calls, terminal work, computer use, and cross-step correction reliable.
Agentic engineering and tool use are the evaluation spine
MCP Atlas and OSWorld-Verified measure click, type, read/write, and verify success in real agent workflows.
MCP Atlas 83.6% and OSWorld-Verified 78.4% are the two clearest signals for real agent workflows — they measure click, type, read/write, and verify success. Terminal-bench 2.1 76.2% and SWE-bench Pro 55.1% cover terminal and diverse software-engineering tasks, while MMMU-Pro 83.6% shows multimodal understanding is not sacrificed for the Flash form factor.
Reasoning and long context
Holds first-tier Flash results at higher thinking levels
@1M full-window 26.6% is an honest reminder that precise full-window retrieval is still hard industry-wide.
With higher thinking levels it reaches Humanity's Last Exam 40.2%, ARC-AGI-2 72.1%, and CharXiv 84.2% chart reasoning — all first-tier Flash results. Long-context retrieval is solid at MRCR v2 @128K 77.3%, while @1M 26.6% is an honest reminder that precise full-window retrieval remains difficult across the industry; reach for the Pro tier when you need high-fidelity retrieval across a full 1M window.
When to use it
- Scaled agent workflows: frequent tool calls, computer use, and terminal automation at low latency.
- Cost-sensitive coding assistants: IDE workflows, batch repairs, and CI code checks.
- High-volume multimodal tasks: screenshots, charts, and documents with structured outputs.
- Mixed workloads: run simple requests at low effort and raise thinking for harder cases — one model ID.
CrossModel exposes Gemini 3.5 Flash through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.