CrossModel
Back to model catalog

Moonshot · Model guide

Kimi K2.5

moonshot/kimi-k2.5
Modalities
TextImageVideoText
Context
262K
Max output
33K
Kimi K2.5

An open native multimodal agent model

Context window
256K
262,144 tokens
Max output
32,768
tokens
Continued training
15T
mixed vision-text tokens
Input modalities
Text+image
native model also does video
Visual-to-code
Frontend UI from screenshots / video / text
Agent Swarm
Dynamically split sub-agents to run in parallel
Multi-doc production
Reports, decks, sheets, websites — editable

K2.5 and K2.6 are both multimodal Kimi API models; K2.6 is the newer default, K2.5 stays useful for reproduction and regression.

Overview

Kimi K2.5 is Moonshot AI / Kimi's open native multimodal agent model, released on January 27, 2026. It continues training from Kimi K2 with roughly 15T mixed vision-text tokens, bringing text, image, and video input into one workflow for visual coding, long-form research, document production, and agent swarms.

What matters is that "image to code" and "parallel multi-agent execution" sit on one capability surface. K2.5 generates frontend interfaces from screenshots, videos, or natural-language descriptions, and it can dynamically split complex tasks into sub-agents for search, analysis, generation, and packaging.

Key capabilities

DimensionDetail
Context window262,144 tokens (about 256K)
Max output32,768 tokens
Input modalitiesText, image (native model also supports video input)
Output modalitiesText
Toolsstreaming, JSON output, tool calls, web search, Thinking / Non-Thinking

Kimi K2.5 and K2.6 are both multimodal Kimi API models with 256K context. K2.6 is the newer default; K2.5 remains useful for reproduction, regression comparison, and existing workflows. See live pricing in the model catalog.

Benchmarks

Kimi's technical report defines K2.5 as an open-source multimodal agentic model that jointly optimizes vision, tools, and code generation inside one agent loop — not a chat model with image input bolted on.

Multi-domain Benchmarks

Agents, coding, image, and video optimized together

BrowseComp
74.9
SWE-Bench Verified
76.8
MMMU Pro
78.5
VideoMMMU
86.6

Figures from the Kimi K2.5 technical report; vision and text abilities reinforce each other.

Reported scores span agents, coding, image, and video: HLE-Full 50.2, BrowseComp 74.9, DeepSearchQA 77.1; SWE-Bench Verified 76.8, SWE-Bench Multilingual 73.0; MMMU Pro 78.5, MathVision 84.2, OmniDocBench 1.5 88.8; VideoMMMU 86.6, LongVideoBench 79.8. The report credits joint text-vision pretraining, zero-vision SFT, and joint text-vision RL for letting text reasoning and vision reinforce each other.

Agent Swarm

Agent Swarm

Self-orchestrated parallel multi-agent execution

Sub-agents
100
self-orchestration cap
Tool calls
1,500
executed in parallel
Latency
4.5×
lower than single-agent

The model splits roles dynamically per task — good for long reports, bulk gathering, and multi-file deliverables.

Agent Swarm is K2.5's signature feature. Kimi describes self-orchestration of up to 100 sub-agents and 1,500 tool calls, with up to 4.5x lower latency versus a single-agent baseline. It is not a hard-coded workflow — the model splits roles and subproblems on its own, which suits long reports, bulk information gathering, website generation, and multi-file deliverables. Kimi's product pages apply it to Docs, Slides, Sheets, Websites, and Deep Research, where outputs are previewable, downloadable, and editable.

When to use it

  • Visual-to-code workflows: generate frontend pages and interactions from screenshots, videos, or design descriptions.
  • Multi-document production: reports, contracts, research docs, slides, and spreadsheets.
  • Deep research and batch analysis: use Agent Swarm to parallelize search and synthesis.
  • Migration testing: compare K2.5 and K2.6 before moving existing prompts or agent scaffolds.

CrossModel exposes Kimi K2.5 through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.