An open native multimodal agent model
K2.5 and K2.6 are both multimodal Kimi API models; K2.6 is the newer default, K2.5 stays useful for reproduction and regression.
Overview
Kimi K2.5 is Moonshot AI / Kimi's open native multimodal agent model, released on January 27, 2026. It continues training from Kimi K2 with roughly 15T mixed vision-text tokens, bringing text, image, and video input into one workflow for visual coding, long-form research, document production, and agent swarms.
What matters is that "image to code" and "parallel multi-agent execution" sit on one capability surface. K2.5 generates frontend interfaces from screenshots, videos, or natural-language descriptions, and it can dynamically split complex tasks into sub-agents for search, analysis, generation, and packaging.
Key capabilities
| Dimension | Detail |
|---|---|
| Context window | 262,144 tokens (about 256K) |
| Max output | 32,768 tokens |
| Input modalities | Text, image (native model also supports video input) |
| Output modalities | Text |
| Tools | streaming, JSON output, tool calls, web search, Thinking / Non-Thinking |
Kimi K2.5 and K2.6 are both multimodal Kimi API models with 256K context. K2.6 is the newer default; K2.5 remains useful for reproduction, regression comparison, and existing workflows. See live pricing in the model catalog.
Benchmarks
Kimi's technical report defines K2.5 as an open-source multimodal agentic model that jointly optimizes vision, tools, and code generation inside one agent loop — not a chat model with image input bolted on.
Agents, coding, image, and video optimized together
Figures from the Kimi K2.5 technical report; vision and text abilities reinforce each other.
Reported scores span agents, coding, image, and video: HLE-Full 50.2, BrowseComp 74.9, DeepSearchQA 77.1; SWE-Bench Verified 76.8, SWE-Bench Multilingual 73.0; MMMU Pro 78.5, MathVision 84.2, OmniDocBench 1.5 88.8; VideoMMMU 86.6, LongVideoBench 79.8. The report credits joint text-vision pretraining, zero-vision SFT, and joint text-vision RL for letting text reasoning and vision reinforce each other.
Agent Swarm
Self-orchestrated parallel multi-agent execution
The model splits roles dynamically per task — good for long reports, bulk gathering, and multi-file deliverables.
Agent Swarm is K2.5's signature feature. Kimi describes self-orchestration of up to 100 sub-agents and 1,500 tool calls, with up to 4.5x lower latency versus a single-agent baseline. It is not a hard-coded workflow — the model splits roles and subproblems on its own, which suits long reports, bulk information gathering, website generation, and multi-file deliverables. Kimi's product pages apply it to Docs, Slides, Sheets, Websites, and Deep Research, where outputs are previewable, downloadable, and editable.
When to use it
- Visual-to-code workflows: generate frontend pages and interactions from screenshots, videos, or design descriptions.
- Multi-document production: reports, contracts, research docs, slides, and spreadsheets.
- Deep research and batch analysis: use Agent Swarm to parallelize search and synthesis.
- Migration testing: compare K2.5 and K2.6 before moving existing prompts or agent scaffolds.
CrossModel exposes Kimi K2.5 through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.