The newer open generation for long-horizon coding and agents
Kimi docs recommend kimi-k2.6 for new projects; it supports a thinking on/off parameter.
Overview
Kimi K2.6 is Moonshot AI / Kimi's newer open model for coding, visual understanding, and agent workflows. Kimi positions it as its strongest and most general current model: more stable in long-running code tasks, better at instruction following and self-correction, with text, image, and video input plus Thinking / Non-Thinking operating modes.
Compared with K2.5, K2.6 is less about one-turn benchmark lift and more about making long autonomous execution reliable. Kimi's docs recommend kimi-k2.6 for new projects.
Key capabilities
| Dimension | Detail |
|---|---|
| Context window | 262,144 tokens (about 256K) |
| Max output | 32,768 tokens |
| Input modalities | Text, image (native model also supports video input) |
| Output modalities | Text |
| Tools | streaming, JSON output, tool calls, web search, Thinking / Non-Thinking |
Kimi K2.6 supports a
thinkingparameter. When thinking is enabled, Kimi recommends tool calls withautoornone, and retaining priorreasoning_contentacross multi-step tool calls. See live pricing in the model catalog.
Benchmarks and improvements
Kimi's release material focuses on long engineering tasks and agentic coding rather than one-turn answer quality.
Reliability gains for long-running execution
Figures from CodeBuddy / Factory AI / Vercel evals cited on the official release page.
CodeBuddy reports 12% higher code-generation accuracy, 18% better long-context stability, and 96.60% tool-call success versus K2.5. Factory AI reports about 15% improvement on its benchmark, and Vercel observed more than 50% improvement on a Next.js benchmark. The headline examples are about sustained reliability: K2.6 has executed 4,000+ tool calls in a 12-hour task, optimizing local Qwen3.5-0.8B inference from about 15 tokens/s to about 193 tokens/s, and has run 13-hour financial matching-engine optimization with thousands of tool calls and 4,000+ lines changed.
Agent Swarm and long execution
Larger-scale parallel orchestration and long chains
Good for splitting deep research, document and website generation, and spreadsheet analysis into parallel subtasks.
K2.6 expands Agent Swarm from K2.5's 100 sub-agents / 1,500 steps to as many as 300 sub-agents / 4,000 steps, useful for deep research, document and website generation, spreadsheet analysis, and long-form writing where many parallel subtasks need orchestration. Kimi also shows a 5-day autonomous-ops worklog managing monitoring, incident response, and system operations; on Claw Bench — covering coding, IM-ecosystem integration, information research, scheduled tasks, and memory use — K2.6 clearly outperforms K2.5 on task completion and tool-call accuracy.
When to use it
- Long-horizon software engineering: refactors, DevOps, performance tuning, frontend generation, and multilingual code.
- Multimodal development input: screenshots, designs, video snippets, and text requirements together.
- Autonomous agent workflows: search, tool use, code execution, and multi-step validation.
- Complex content production: long documents, decks, spreadsheets, research reports, and websites.
CrossModel exposes Kimi K2.6 through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.