MiniMax-M3 · Model guide

MiniMax M3

Coding, million-token context, and native multimodality in one model

Context window

long-context tier above 512K

Max output

128K

tokens

Input modalities

text / image / video

Coding

Repository repair, terminal work, and CUDA optimization

Agentic

Task planning, tool calls, and long autonomous iteration

Multimodal

Charts, screenshots, video, and code in one context

Released in June 2026 for production engineering, long-running agents, and computer use.

Overview

MiniMax M3 is MiniMax's frontier coding and agentic model, released on June 1, 2026. It combines three upgrades in one model: production-grade software engineering, a 1M-token context window powered by MiniMax Sparse Attention (MSA), and native multimodal understanding trained from the beginning rather than added later.

The public API model ID is MiniMax-M3. It is available through both OpenAI-compatible and Anthropic-compatible APIs, with text, image, and video input. Thinking is optional: it is off by default and can be enabled with adaptive thinking for harder reasoning and long-running agent tasks.

Key capabilities

Dimension	Detail
Context window	1,000,000 tokens
Max output	128,000 tokens
Input modalities	Text, image, video
Output modalities	Text
Tools	tool use, streaming, computer use, token counting
Reasoning	Optional adaptive thinking

Inputs up to 512K tokens use the standard tier. Inputs above 512K use a separate long-context tier at 2x the standard input, cache-read, and output rates, and may be capacity-limited during rollout. See live pricing in the model catalog.

Coding and agents

Coding & Agentic

Five scores spanning repository repair and tool orchestration

SWE-Bench Pro

59.0%

Terminal-Bench 2.1

66.0%

SWE-fficiency

34.8%

KernelBench Hard

28.8%

MCP Atlas

74.2%

Official evaluations cover software engineering, terminal work, optimization, and MCP tool use.

M3's strongest published results focus on realistic software engineering and tool-driven work. It reaches 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, 34.8% on SWE-fficiency, 28.8% on KernelBench Hard, and 74.2% on MCP Atlas. On BrowseComp it scores 83.5, ahead of Claude Opus 4.7's 79.3 in MiniMax's comparison.

These evaluations cover repository repair, terminal execution, performance optimization, tool orchestration, and autonomous information retrieval. The positioning is broader than code generation: M3 is trained for planning, invoking tools, checking intermediate results, and continuing across long sessions.

Long context and native multimodality

MSA & Long-Horizon Work

Million-token context for autonomous work lasting hours

Prefill speedup

>9x

Decode speedup

>15x

Paper reproduction

12h

18 commits / 23 figures

CUDA optimization

9.4x

1,959 tool calls

Read

Load papers, code, charts, and experiment logs

Plan

Decompose work and choose tools and parallel paths

Run

Execute experiments, benchmarks, and validation

Improve

Patch code from feedback and continue iterating

At 1M context, MSA reduces per-token compute to 1/20 of the previous architecture.

MSA reduces the cost of million-token attention. MiniMax reports that at 1M context, per-token compute is 1/20 of its previous-generation architecture, with more than 9x faster prefill and 15x faster decoding in its tests. Native multimodal training lets the same context mix source code, documents, charts, screenshots, and video.

The launch examples show why these capabilities matter together. M3 reproduced the core experiments of an ICLR 2025 outstanding paper in nearly 12 hours, producing 18 commits and 23 experimental figures. In another run it made 147 benchmark submissions and 1,959 tool calls over roughly 24 hours, improving an FP8 GEMM kernel by 9.4x without human intervention.

When to use it

Large-repository engineering: debugging, refactoring, migration, and cross-file implementation with extensive context.
Long-running coding agents: terminal loops, repeated validation, and performance optimization.
Multimodal development: turning screenshots, charts, documents, or videos into code and structured analysis.
Research automation: paper reproduction, experiment execution, log analysis, and iterative model training.
Computer-use workflows: operating desktop applications and combining visual state with tool calls.

CrossModel exposes MiniMax M3 through OpenAI-compatible /v1/chat/completions and Anthropic-compatible /v1/messages. Current pricing is available in the model catalog.