CrossModel
Back to model catalog

Gemini · Model guide

Gemini 2.5 Pro

gemini/gemini-2.5-pro
Modalities
TextImageAudioVideoText
Context
1M
Max output
66K
Gemini 2.5 Pro

A mature, stable thinking-oriented flagship with a 1M context

Context window
1M
1,048,576 tokens
Max output
64K
65,536 tokens
Released
2025-03
Topped LMArena at launch
Complex reasoning
Multi-step math, science, engineering; thinking adds stability
Long documents
1M context ingests a full document or mid-size repo at once
Web / agentic code
Requirements to runnable frontends with code execution and tools

No longer the newest generation, but a practical pick when you value reliability and long context over frontier scores.

Overview

Gemini 2.5 Pro is Google DeepMind's thinking-oriented reasoning model, released on March 25, 2025. At launch it topped the LMArena human-preference leaderboard and became the flagship of the Gemini 2.5 family, pairing deliberate "think first, then answer" reasoning with a native multimodal foundation for text, image, audio, and video input.

It is no longer Google's newest generation — Gemini 3 pushes further on speed and hard-problem density — but it remains a mature, stable, high-context reasoning model. For production systems that don't need the absolute frontier yet value reliability and a 1M-token window, 2.5 Pro is still a practical choice.

Key capabilities

DimensionDetail
Context window1,048,576 tokens (about 1M)
Max output65,536 tokens (about 64K)
Input modalitiesText, image (Google's native model also supports audio and video)
Output modalitiesText
Toolsfunction calling, structured outputs, streaming, thinking, code execution

Gemini Pro-family requests enter a higher tier when single-request input exceeds 200K tokens (roughly input and 1.5× output multipliers). This is a product pricing structure, not a per-unit price; see live rates in the model catalog.

Benchmarks

Gemini 2.5 Pro's evaluation spine is math / science reasoning plus software engineering, and those results were obtained without expensive majority-voting test-time tricks.

Reasoning Benchmarks

Front-rank math / science reasoning for early 2025

AIME 2024
92.0%
AIME 2025
86.7%
GPQA Diamond
84.0%
Humanity's Last Exam
18.8%
No tools, SOTA at the time

All scores achieved without expensive test-time tricks such as majority voting.

On math, AIME 2024 92.0% and AIME 2025 86.7%; on science, GPQA Diamond 84.0% — front-rank reasoning for early 2025. The clearest "ceiling" signal is Humanity's Last Exam 18.8%, which was the no-tools SOTA at the time, showing it had real footing on the hardest cross-disciplinary academic questions. Combined with topping LMArena at launch, 2.5 Pro earned best-in-generation marks for both accuracy and answer quality.

Software engineering and long context

Engineering & Long Context

Capable of real repo fixes and long-document retrieval

SWE-bench Verified
63.8%
Custom agent setup
Context window
1M
Whole repo / long doc at once
LMArena
#1
Human-preference top at launch

SWE-bench Verified uses a custom agent setup; the 1M window supports cross-section retrieval and synthesis.

SWE-bench Verified 63.8% (in a custom agent setup) is already enough to handle real-repo bug fixes. With a 1M-token window, 2.5 Pro can ingest a full long document, a mid-size codebase, or a long conversation history in a single request, then reason and retrieve across sections. Google also emphasized visually polished web-app generation and agentic coding, which is useful when requirements, design context, and existing code need to be reasoned over together.

When to use it

  • Complex reasoning and STEM work: multi-step math, science, and engineering problems where thinking improves stability.
  • Long-document and mid-size repo analysis: 1M context for cross-section retrieval and synthesis.
  • Web and agentic code: move from requirements to runnable frontends with code execution and tools.
  • Stable reasoning at controlled cost: choose it when newest-generation capability is not required.

CrossModel exposes Gemini 2.5 Pro through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.