Kimi K2.5 · Model guide

Kimi K2.5

An open native multimodal agent model

Context window

256K

262,144 tokens

Max output

32,768

tokens

Continued training

15T

mixed vision-text tokens

Input modalities

Text+image

native model also does video

Visual-to-code

Frontend UI from screenshots / video / text

Agent Swarm

Dynamically split sub-agents to run in parallel

Multi-doc production

Reports, decks, sheets, websites — editable

K2.5 and K2.6 are both multimodal Kimi API models; K2.6 is the newer default, K2.5 stays useful for reproduction and regression.

Overview

Kimi K2.5 is Moonshot AI / Kimi's open native multimodal agent model, released on January 27, 2026. It continues training from Kimi K2 with roughly 15T mixed vision-text tokens, bringing text, image, and video input into one workflow for visual coding, long-form research, document production, and agent swarms.

What matters is that "image to code" and "parallel multi-agent execution" sit on one capability surface. K2.5 generates frontend interfaces from screenshots, videos, or natural-language descriptions, and it can dynamically split complex tasks into sub-agents for search, analysis, generation, and packaging.

Key capabilities

Dimension	Detail
Context window	262,144 tokens (about 256K)
Max output	32,768 tokens
Input modalities	Text, image (native model also supports video input)
Output modalities	Text
Tools	streaming, JSON output, tool calls, web search, Thinking / Non-Thinking

Kimi K2.5 and K2.6 are both multimodal Kimi API models with 256K context. K2.6 is the newer default; K2.5 remains useful for reproduction, regression comparison, and existing workflows. See live pricing in the model catalog.

Benchmarks

Kimi's technical report defines K2.5 as an open-source multimodal agentic model that jointly optimizes vision, tools, and code generation inside one agent loop — not a chat model with image input bolted on.

Multi-domain Benchmarks

Agents, coding, image, and video optimized together

BrowseComp

74.9

SWE-Bench Verified

76.8

MMMU Pro

78.5

VideoMMMU

86.6

Figures from the Kimi K2.5 technical report; vision and text abilities reinforce each other.

Reported scores span agents, coding, image, and video: HLE-Full 50.2, BrowseComp 74.9, DeepSearchQA 77.1; SWE-Bench Verified 76.8, SWE-Bench Multilingual 73.0; MMMU Pro 78.5, MathVision 84.2, OmniDocBench 1.5 88.8; VideoMMMU 86.6, LongVideoBench 79.8. The report credits joint text-vision pretraining, zero-vision SFT, and joint text-vision RL for letting text reasoning and vision reinforce each other.

Agent Swarm

Self-orchestrated parallel multi-agent execution

Sub-agents

100

self-orchestration cap

Tool calls

1,500

executed in parallel

Latency

4.5×

lower than single-agent

The model splits roles dynamically per task — good for long reports, bulk gathering, and multi-file deliverables.

Agent Swarm is K2.5's signature feature. Kimi describes self-orchestration of up to 100 sub-agents and 1,500 tool calls, with up to 4.5x lower latency versus a single-agent baseline. It is not a hard-coded workflow — the model splits roles and subproblems on its own, which suits long reports, bulk information gathering, website generation, and multi-file deliverables. Kimi's product pages apply it to Docs, Slides, Sheets, Websites, and Deep Research, where outputs are previewable, downloadable, and editable.

When to use it

Visual-to-code workflows: generate frontend pages and interactions from screenshots, videos, or design descriptions.
Multi-document production: reports, contracts, research docs, slides, and spreadsheets.
Deep research and batch analysis: use Agent Swarm to parallelize search and synthesis.
Migration testing: compare K2.5 and K2.6 before moving existing prompts or agent scaffolds.

CrossModel exposes Kimi K2.5 through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.