CrossModel
Back to model catalog

Moonshot · Model guide

Kimi K2.6

moonshot/kimi-k2.6
Modalities
TextImageVideoText
Context
262K
Max output
33K
Kimi K2.6

The newer open generation for long-horizon coding and agents

Context window
256K
262,144 tokens
Max output
32,768
tokens
Agent Swarm
300
sub-agents / 4,000 steps
Input modalities
Text+image
native model also does video
Long-horizon engineering
Cross-file edits, DevOps, performance tuning
Autonomous agent flows
Search, tool use, multi-step validation
Complex production
Long docs, decks, sheets, reports, websites

Kimi docs recommend kimi-k2.6 for new projects; it supports a thinking on/off parameter.

Overview

Kimi K2.6 is Moonshot AI / Kimi's newer open model for coding, visual understanding, and agent workflows. Kimi positions it as its strongest and most general current model: more stable in long-running code tasks, better at instruction following and self-correction, with text, image, and video input plus Thinking / Non-Thinking operating modes.

Compared with K2.5, K2.6 is less about one-turn benchmark lift and more about making long autonomous execution reliable. Kimi's docs recommend kimi-k2.6 for new projects.

Key capabilities

DimensionDetail
Context window262,144 tokens (about 256K)
Max output32,768 tokens
Input modalitiesText, image (native model also supports video input)
Output modalitiesText
Toolsstreaming, JSON output, tool calls, web search, Thinking / Non-Thinking

Kimi K2.6 supports a thinking parameter. When thinking is enabled, Kimi recommends tool calls with auto or none, and retaining prior reasoning_content across multi-step tool calls. See live pricing in the model catalog.

Benchmarks and improvements

Kimi's release material focuses on long engineering tasks and agentic coding rather than one-turn answer quality.

Engineering Gains

Reliability gains for long-running execution

Code accuracy
+12%
vs K2.5 (CodeBuddy)
Long-context stability
+18%
vs K2.5 (CodeBuddy)
Tool-call success
96.60%
CodeBuddy
Next.js benchmark
+50%
Vercel observed

Figures from CodeBuddy / Factory AI / Vercel evals cited on the official release page.

CodeBuddy reports 12% higher code-generation accuracy, 18% better long-context stability, and 96.60% tool-call success versus K2.5. Factory AI reports about 15% improvement on its benchmark, and Vercel observed more than 50% improvement on a Next.js benchmark. The headline examples are about sustained reliability: K2.6 has executed 4,000+ tool calls in a 12-hour task, optimizing local Qwen3.5-0.8B inference from about 15 tokens/s to about 193 tokens/s, and has run 13-hour financial matching-engine optimization with thousands of tool calls and 4,000+ lines changed.

Agent Swarm and long execution

Agent Swarm

Larger-scale parallel orchestration and long chains

Sub-agents
300
K2.5 was 100
Steps
4,000
K2.5 was 1,500
Tool calls / task
4,000+
12-hour long task case

Good for splitting deep research, document and website generation, and spreadsheet analysis into parallel subtasks.

K2.6 expands Agent Swarm from K2.5's 100 sub-agents / 1,500 steps to as many as 300 sub-agents / 4,000 steps, useful for deep research, document and website generation, spreadsheet analysis, and long-form writing where many parallel subtasks need orchestration. Kimi also shows a 5-day autonomous-ops worklog managing monitoring, incident response, and system operations; on Claw Bench — covering coding, IM-ecosystem integration, information research, scheduled tasks, and memory use — K2.6 clearly outperforms K2.5 on task completion and tool-call accuracy.

When to use it

  • Long-horizon software engineering: refactors, DevOps, performance tuning, frontend generation, and multilingual code.
  • Multimodal development input: screenshots, designs, video snippets, and text requirements together.
  • Autonomous agent workflows: search, tool use, code execution, and multi-step validation.
  • Complex content production: long documents, decks, spreadsheets, research reports, and websites.

CrossModel exposes Kimi K2.6 through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.