Gemini 3.5 Flash

A fast, efficient frontier model that approaches Pro tier

Context window

1,048,576 tokens

Max output

64K

65,536 tokens

Thinking

Levels

Adjustable effort

Scaled agents

Frequent tool calls, computer use, terminal automation

Cost-sensitive coding

IDE workflows, batch repairs, CI code checks

High-volume multimodal

Screenshots, charts, documents, structured outputs

Adjustable thinking levels: low effort for simple requests, higher effort for complex reasoning.

Overview

Gemini 3.5 Flash is Google DeepMind's fast, efficient frontier model, released on May 19, 2026. Built on Gemini 3 Flash, it adds adjustable thinking levels so developers can trade quality, cost, and latency per task: low effort for simple requests, higher effort for complex reasoning.

Its positioning is unusual for a Flash model. On agentic coding and multi-step tool use, 3.5 Flash approaches Pro-tier results in several benchmarks while keeping the speed and cost profile expected from Flash. CrossModel exposes it as gemini/gemini-3.5-flash.

Key capabilities

Dimension	Detail
Context window	1,048,576 tokens (about 1M)
Max output	65,536 tokens (about 64K)
Input modalities	Text, image (Google's native model also supports audio and video)
Output modalities	Text
Tools	function calling, structured outputs, streaming, adjustable thinking levels, multi-step tool use (MCP)

Flash-tier models use a single pricing tier without a long-context surcharge, which suits frequent, low-latency, cost-sensitive workloads. See live rates in the model catalog.

Benchmarks

The core evaluation story is agentic engineering and tool use: keep latency low while making repeated tool calls, terminal work, computer use, and cross-step correction reliable.

Agentic Benchmarks

Agentic engineering and tool use are the evaluation spine

MCP Atlas

83.6%

OSWorld-Verified

78.4%

Computer use

Terminal-bench 2.1

76.2%

SWE-bench Pro

55.1%

MMMU-Pro

83.6%

MCP Atlas and OSWorld-Verified measure click, type, read/write, and verify success in real agent workflows.

MCP Atlas 83.6% and OSWorld-Verified 78.4% are the two clearest signals for real agent workflows — they measure click, type, read/write, and verify success. Terminal-bench 2.1 76.2% and SWE-bench Pro 55.1% cover terminal and diverse software-engineering tasks, while MMMU-Pro 83.6% shows multimodal understanding is not sacrificed for the Flash form factor.

Reasoning and long context

Reasoning & Long Context

Holds first-tier Flash results at higher thinking levels

Humanity's Last Exam

40.2%

ARC-AGI-2

72.1%

CharXiv

84.2%

Chart reasoning

MRCR v2 @128K

77.3%

MRCR v2 @1M

26.6%

Full-window retrieval

@1M full-window 26.6% is an honest reminder that precise full-window retrieval is still hard industry-wide.

With higher thinking levels it reaches Humanity's Last Exam 40.2%, ARC-AGI-2 72.1%, and CharXiv 84.2% chart reasoning — all first-tier Flash results. Long-context retrieval is solid at MRCR v2 @128K 77.3%, while @1M 26.6% is an honest reminder that precise full-window retrieval remains difficult across the industry; reach for the Pro tier when you need high-fidelity retrieval across a full 1M window.

When to use it

Scaled agent workflows: frequent tool calls, computer use, and terminal automation at low latency.
Cost-sensitive coding assistants: IDE workflows, batch repairs, and CI code checks.
High-volume multimodal tasks: screenshots, charts, and documents with structured outputs.
Mixed workloads: run simple requests at low effort and raise thinking for harder cases — one model ID.

CrossModel exposes Gemini 3.5 Flash through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.