Coding, million-token context, and native multimodality in one model
Released in June 2026 for production engineering, long-running agents, and computer use.
Overview
MiniMax M3 is MiniMax's frontier coding and agentic model, released on June 1, 2026. It combines three upgrades in one model: production-grade software engineering, a 1M-token context window powered by MiniMax Sparse Attention (MSA), and native multimodal understanding trained from the beginning rather than added later.
The public API model ID is MiniMax-M3. It is available through both OpenAI-compatible and Anthropic-compatible APIs, with text, image, and video input. Thinking is optional: it is off by default and can be enabled with adaptive thinking for harder reasoning and long-running agent tasks.
Key capabilities
| Dimension | Detail |
|---|---|
| Context window | 1,000,000 tokens |
| Max output | 128,000 tokens |
| Input modalities | Text, image, video |
| Output modalities | Text |
| Tools | tool use, streaming, computer use, token counting |
| Reasoning | Optional adaptive thinking |
Inputs up to 512K tokens use the standard tier. Inputs above 512K use a separate long-context tier at 2x the standard input, cache-read, and output rates, and may be capacity-limited during rollout. See live pricing in the model catalog.
Coding and agents
Five scores spanning repository repair and tool orchestration
Official evaluations cover software engineering, terminal work, optimization, and MCP tool use.
M3's strongest published results focus on realistic software engineering and tool-driven work. It reaches 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, 34.8% on SWE-fficiency, 28.8% on KernelBench Hard, and 74.2% on MCP Atlas. On BrowseComp it scores 83.5, ahead of Claude Opus 4.7's 79.3 in MiniMax's comparison.
These evaluations cover repository repair, terminal execution, performance optimization, tool orchestration, and autonomous information retrieval. The positioning is broader than code generation: M3 is trained for planning, invoking tools, checking intermediate results, and continuing across long sessions.
Long context and native multimodality
Million-token context for autonomous work lasting hours
At 1M context, MSA reduces per-token compute to 1/20 of the previous architecture.
MSA reduces the cost of million-token attention. MiniMax reports that at 1M context, per-token compute is 1/20 of its previous-generation architecture, with more than 9x faster prefill and 15x faster decoding in its tests. Native multimodal training lets the same context mix source code, documents, charts, screenshots, and video.
The launch examples show why these capabilities matter together. M3 reproduced the core experiments of an ICLR 2025 outstanding paper in nearly 12 hours, producing 18 commits and 23 experimental figures. In another run it made 147 benchmark submissions and 1,959 tool calls over roughly 24 hours, improving an FP8 GEMM kernel by 9.4x without human intervention.
When to use it
- Large-repository engineering: debugging, refactoring, migration, and cross-file implementation with extensive context.
- Long-running coding agents: terminal loops, repeated validation, and performance optimization.
- Multimodal development: turning screenshots, charts, documents, or videos into code and structured analysis.
- Research automation: paper reproduction, experiment execution, log analysis, and iterative model training.
- Computer-use workflows: operating desktop applications and combining visual state with tool calls.
CrossModel exposes MiniMax M3 through OpenAI-compatible /v1/chat/completions and Anthropic-compatible /v1/messages. Current pricing is available in the model catalog.