CrossModel
Back to model catalog

Gemini · Model guide

Gemini 3.5 Flash

gemini/gemini-3.5-flash
Modalities
TextImageAudioVideoText
Context
1M
Max output
66K
Gemini 3.5 Flash

A fast, efficient frontier model that approaches Pro tier

Context window
1M
1,048,576 tokens
Max output
64K
65,536 tokens
Thinking
Levels
Adjustable effort
Scaled agents
Frequent tool calls, computer use, terminal automation
Cost-sensitive coding
IDE workflows, batch repairs, CI code checks
High-volume multimodal
Screenshots, charts, documents, structured outputs

Adjustable thinking levels: low effort for simple requests, higher effort for complex reasoning.

Overview

Gemini 3.5 Flash is Google DeepMind's fast, efficient frontier model, released on May 19, 2026. Built on Gemini 3 Flash, it adds adjustable thinking levels so developers can trade quality, cost, and latency per task: low effort for simple requests, higher effort for complex reasoning.

Its positioning is unusual for a Flash model. On agentic coding and multi-step tool use, 3.5 Flash approaches Pro-tier results in several benchmarks while keeping the speed and cost profile expected from Flash. CrossModel exposes it as gemini/gemini-3.5-flash.

Key capabilities

DimensionDetail
Context window1,048,576 tokens (about 1M)
Max output65,536 tokens (about 64K)
Input modalitiesText, image (Google's native model also supports audio and video)
Output modalitiesText
Toolsfunction calling, structured outputs, streaming, adjustable thinking levels, multi-step tool use (MCP)

Flash-tier models use a single pricing tier without a long-context surcharge, which suits frequent, low-latency, cost-sensitive workloads. See live rates in the model catalog.

Benchmarks

The core evaluation story is agentic engineering and tool use: keep latency low while making repeated tool calls, terminal work, computer use, and cross-step correction reliable.

Agentic Benchmarks

Agentic engineering and tool use are the evaluation spine

MCP Atlas
83.6%
OSWorld-Verified
78.4%
Computer use
Terminal-bench 2.1
76.2%
SWE-bench Pro
55.1%
MMMU-Pro
83.6%

MCP Atlas and OSWorld-Verified measure click, type, read/write, and verify success in real agent workflows.

MCP Atlas 83.6% and OSWorld-Verified 78.4% are the two clearest signals for real agent workflows — they measure click, type, read/write, and verify success. Terminal-bench 2.1 76.2% and SWE-bench Pro 55.1% cover terminal and diverse software-engineering tasks, while MMMU-Pro 83.6% shows multimodal understanding is not sacrificed for the Flash form factor.

Reasoning and long context

Reasoning & Long Context

Holds first-tier Flash results at higher thinking levels

Humanity's Last Exam
40.2%
ARC-AGI-2
72.1%
CharXiv
84.2%
Chart reasoning
MRCR v2 @128K
77.3%
MRCR v2 @1M
26.6%
Full-window retrieval

@1M full-window 26.6% is an honest reminder that precise full-window retrieval is still hard industry-wide.

With higher thinking levels it reaches Humanity's Last Exam 40.2%, ARC-AGI-2 72.1%, and CharXiv 84.2% chart reasoning — all first-tier Flash results. Long-context retrieval is solid at MRCR v2 @128K 77.3%, while @1M 26.6% is an honest reminder that precise full-window retrieval remains difficult across the industry; reach for the Pro tier when you need high-fidelity retrieval across a full 1M window.

When to use it

  • Scaled agent workflows: frequent tool calls, computer use, and terminal automation at low latency.
  • Cost-sensitive coding assistants: IDE workflows, batch repairs, and CI code checks.
  • High-volume multimodal tasks: screenshots, charts, and documents with structured outputs.
  • Mixed workloads: run simple requests at low effort and raise thinking for harder cases — one model ID.

CrossModel exposes Gemini 3.5 Flash through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.