Gemini 2.5 Pro

A mature, stable thinking-oriented flagship with a 1M context

Context window

1,048,576 tokens

Max output

64K

65,536 tokens

Released

2025-03

Topped LMArena at launch

Complex reasoning

Multi-step math, science, engineering; thinking adds stability

Long documents

1M context ingests a full document or mid-size repo at once

Web / agentic code

Requirements to runnable frontends with code execution and tools

No longer the newest generation, but a practical pick when you value reliability and long context over frontier scores.

Overview

Gemini 2.5 Pro is Google DeepMind's thinking-oriented reasoning model, released on March 25, 2025. At launch it topped the LMArena human-preference leaderboard and became the flagship of the Gemini 2.5 family, pairing deliberate "think first, then answer" reasoning with a native multimodal foundation for text, image, audio, and video input.

It is no longer Google's newest generation — Gemini 3 pushes further on speed and hard-problem density — but it remains a mature, stable, high-context reasoning model. For production systems that don't need the absolute frontier yet value reliability and a 1M-token window, 2.5 Pro is still a practical choice.

Key capabilities

Dimension	Detail
Context window	1,048,576 tokens (about 1M)
Max output	65,536 tokens (about 64K)
Input modalities	Text, image (Google's native model also supports audio and video)
Output modalities	Text
Tools	function calling, structured outputs, streaming, thinking, code execution

Gemini Pro-family requests enter a higher tier when single-request input exceeds 200K tokens (roughly 2× input and 1.5× output multipliers). This is a product pricing structure, not a per-unit price; see live rates in the model catalog.

Benchmarks

Gemini 2.5 Pro's evaluation spine is math / science reasoning plus software engineering, and those results were obtained without expensive majority-voting test-time tricks.

Reasoning Benchmarks

Front-rank math / science reasoning for early 2025

AIME 2024

92.0%

AIME 2025

86.7%

GPQA Diamond

84.0%

Humanity's Last Exam

18.8%

No tools, SOTA at the time

All scores achieved without expensive test-time tricks such as majority voting.

On math, AIME 2024 92.0% and AIME 2025 86.7%; on science, GPQA Diamond 84.0% — front-rank reasoning for early 2025. The clearest "ceiling" signal is Humanity's Last Exam 18.8%, which was the no-tools SOTA at the time, showing it had real footing on the hardest cross-disciplinary academic questions. Combined with topping LMArena at launch, 2.5 Pro earned best-in-generation marks for both accuracy and answer quality.

Software engineering and long context

Engineering & Long Context

Capable of real repo fixes and long-document retrieval

SWE-bench Verified

63.8%

Custom agent setup

Context window

Whole repo / long doc at once

LMArena

Human-preference top at launch

SWE-bench Verified uses a custom agent setup; the 1M window supports cross-section retrieval and synthesis.

SWE-bench Verified 63.8% (in a custom agent setup) is already enough to handle real-repo bug fixes. With a 1M-token window, 2.5 Pro can ingest a full long document, a mid-size codebase, or a long conversation history in a single request, then reason and retrieve across sections. Google also emphasized visually polished web-app generation and agentic coding, which is useful when requirements, design context, and existing code need to be reasoned over together.

When to use it

Complex reasoning and STEM work: multi-step math, science, and engineering problems where thinking improves stability.
Long-document and mid-size repo analysis: 1M context for cross-section retrieval and synthesis.
Web and agentic code: move from requirements to runnable frontends with code execution and tools.
Stable reasoning at controlled cost: choose it when newest-generation capability is not required.

CrossModel exposes Gemini 2.5 Pro through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.