Kimi K2.6 · Model guide

Kimi K2.6

The newer open generation for long-horizon coding and agents

Context window

256K

262,144 tokens

Max output

32,768

tokens

Agent Swarm

300

sub-agents / 4,000 steps

Input modalities

Text+image

native model also does video

Long-horizon engineering

Cross-file edits, DevOps, performance tuning

Autonomous agent flows

Search, tool use, multi-step validation

Complex production

Long docs, decks, sheets, reports, websites

Kimi docs recommend kimi-k2.6 for new projects; it supports a thinking on/off parameter.

Overview

Kimi K2.6 is Moonshot AI / Kimi's newer open model for coding, visual understanding, and agent workflows. Kimi positions it as its strongest and most general current model: more stable in long-running code tasks, better at instruction following and self-correction, with text, image, and video input plus Thinking / Non-Thinking operating modes.

Compared with K2.5, K2.6 is less about one-turn benchmark lift and more about making long autonomous execution reliable. Kimi's docs recommend kimi-k2.6 for new projects.

Key capabilities

Dimension	Detail
Context window	262,144 tokens (about 256K)
Max output	32,768 tokens
Input modalities	Text, image (native model also supports video input)
Output modalities	Text
Tools	streaming, JSON output, tool calls, web search, Thinking / Non-Thinking

Kimi K2.6 supports a thinking parameter. When thinking is enabled, Kimi recommends tool calls with auto or none, and retaining prior reasoning_content across multi-step tool calls. See live pricing in the model catalog.

Benchmarks and improvements

Kimi's release material focuses on long engineering tasks and agentic coding rather than one-turn answer quality.

Engineering Gains

Reliability gains for long-running execution

Code accuracy

+12%

vs K2.5 (CodeBuddy)

Long-context stability

+18%

vs K2.5 (CodeBuddy)

Tool-call success

96.60%

CodeBuddy

Next.js benchmark

+50%

Vercel observed

Figures from CodeBuddy / Factory AI / Vercel evals cited on the official release page.

CodeBuddy reports 12% higher code-generation accuracy, 18% better long-context stability, and 96.60% tool-call success versus K2.5. Factory AI reports about 15% improvement on its benchmark, and Vercel observed more than 50% improvement on a Next.js benchmark. The headline examples are about sustained reliability: K2.6 has executed 4,000+ tool calls in a 12-hour task, optimizing local Qwen3.5-0.8B inference from about 15 tokens/s to about 193 tokens/s, and has run 13-hour financial matching-engine optimization with thousands of tool calls and 4,000+ lines changed.

Agent Swarm and long execution

Agent Swarm

Larger-scale parallel orchestration and long chains

Sub-agents

300

K2.5 was 100

Steps

4,000

K2.5 was 1,500

Tool calls / task

4,000+

12-hour long task case

Good for splitting deep research, document and website generation, and spreadsheet analysis into parallel subtasks.

K2.6 expands Agent Swarm from K2.5's 100 sub-agents / 1,500 steps to as many as 300 sub-agents / 4,000 steps, useful for deep research, document and website generation, spreadsheet analysis, and long-form writing where many parallel subtasks need orchestration. Kimi also shows a 5-day autonomous-ops worklog managing monitoring, incident response, and system operations; on Claw Bench — covering coding, IM-ecosystem integration, information research, scheduled tasks, and memory use — K2.6 clearly outperforms K2.5 on task completion and tool-call accuracy.

When to use it

Long-horizon software engineering: refactors, DevOps, performance tuning, frontend generation, and multilingual code.
Multimodal development input: screenshots, designs, video snippets, and text requirements together.
Autonomous agent workflows: search, tool use, code execution, and multi-step validation.
Complex content production: long documents, decks, spreadsheets, research reports, and websites.

CrossModel exposes Kimi K2.6 through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.