The fastest, cheapest, lowest-latency tier in the 2.5 family
Shares the 1M context and native multimodal foundation with Flash and Pro, but pushes the tradeoff toward speed and cost.
Overview
Gemini 2.5 Flash-Lite is Google DeepMind's lightweight reasoning model, released on June 17, 2025. It is the fastest, lowest-latency, lowest-cost tier in the Gemini 2.5 family, designed for large-scale calls where throughput and response time matter as much as quality.
CrossModel exposes it as gemini/gemini-2.5-flash-lite. It shares the 2.5 family's 1M context and native multimodal foundation with Flash and Pro, but pushes the tradeoff toward speed and cost.
Key capabilities
| Dimension | Detail |
|---|---|
| Context window | 1,048,576 tokens (about 1M) |
| Max output | 65,536 tokens (about 64K) |
| Input modalities | Text, image (Google's native model also supports audio and video) |
| Output modalities | Text |
| Tools | function calling, structured outputs, streaming, optional thinking |
Flash-Lite uses a single low pricing tier in the 2.5 family, which suits frequent production calls. See live rates in the model catalog.
Position in the 2.5 family
Save the expensive compute budget for genuinely hard steps
Handle high-volume, low-risk, verifiable narrow tasks on Flash-Lite first, then escalate step by step.
Flash-Lite works best on high-volume, low-risk, clearly verifiable steps. In non-thinking mode it can reach roughly 215 tokens/s, which makes it a good fit for intent classification, keyword extraction, translation, tagging, and lightweight vision. Treat it as the first layer of a router: handle the easy, high-frequency traffic here, then escalate ambiguous or failed samples to Flash (with a larger thinking budget) or to Pro (for the hardest reasoning). This keeps response time low while saving the more expensive compute budget for genuinely complex steps.
When to use it
- High-volume narrow tasks: classification, extraction, translation, tagging, and routing.
- Scaled online services: RAG preprocessing, realtime summaries, and lightweight Q&A.
- Light multimodal processing: screenshots, receipts, product images, and form photos.
- First layer of a router: handle easy traffic before escalating to Flash or Pro.
CrossModel exposes Gemini 2.5 Flash-Lite through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.