CrossModel
Back to model catalog

MiniMax · Model guide

MiniMax-M3

minimax/minimax-m3
Modalities
TextImageVideoText
Context
1M
Max output
512K
MiniMax M3

Coding, million-token context, and native multimodality in one model

Context window
1M
long-context tier above 512K
Max output
128K
tokens
Input modalities
3
text / image / video
Coding
Repository repair, terminal work, and CUDA optimization
Agentic
Task planning, tool calls, and long autonomous iteration
Multimodal
Charts, screenshots, video, and code in one context

Released in June 2026 for production engineering, long-running agents, and computer use.

Overview

MiniMax M3 is MiniMax's frontier coding and agentic model, released on June 1, 2026. It combines three upgrades in one model: production-grade software engineering, a 1M-token context window powered by MiniMax Sparse Attention (MSA), and native multimodal understanding trained from the beginning rather than added later.

The public API model ID is MiniMax-M3. It is available through both OpenAI-compatible and Anthropic-compatible APIs, with text, image, and video input. Thinking is optional: it is off by default and can be enabled with adaptive thinking for harder reasoning and long-running agent tasks.

Key capabilities

DimensionDetail
Context window1,000,000 tokens
Max output128,000 tokens
Input modalitiesText, image, video
Output modalitiesText
Toolstool use, streaming, computer use, token counting
ReasoningOptional adaptive thinking

Inputs up to 512K tokens use the standard tier. Inputs above 512K use a separate long-context tier at 2x the standard input, cache-read, and output rates, and may be capacity-limited during rollout. See live pricing in the model catalog.

Coding and agents

Coding & Agentic

Five scores spanning repository repair and tool orchestration

SWE-Bench Pro
59.0%
Terminal-Bench 2.1
66.0%
SWE-fficiency
34.8%
KernelBench Hard
28.8%
MCP Atlas
74.2%

Official evaluations cover software engineering, terminal work, optimization, and MCP tool use.

M3's strongest published results focus on realistic software engineering and tool-driven work. It reaches 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, 34.8% on SWE-fficiency, 28.8% on KernelBench Hard, and 74.2% on MCP Atlas. On BrowseComp it scores 83.5, ahead of Claude Opus 4.7's 79.3 in MiniMax's comparison.

These evaluations cover repository repair, terminal execution, performance optimization, tool orchestration, and autonomous information retrieval. The positioning is broader than code generation: M3 is trained for planning, invoking tools, checking intermediate results, and continuing across long sessions.

Long context and native multimodality

MSA & Long-Horizon Work

Million-token context for autonomous work lasting hours

Prefill speedup
>9x
Decode speedup
>15x
Paper reproduction
12h
18 commits / 23 figures
CUDA optimization
9.4x
1,959 tool calls
01
Read
Load papers, code, charts, and experiment logs
02
Plan
Decompose work and choose tools and parallel paths
03
Run
Execute experiments, benchmarks, and validation
04
Improve
Patch code from feedback and continue iterating

At 1M context, MSA reduces per-token compute to 1/20 of the previous architecture.

MSA reduces the cost of million-token attention. MiniMax reports that at 1M context, per-token compute is 1/20 of its previous-generation architecture, with more than 9x faster prefill and 15x faster decoding in its tests. Native multimodal training lets the same context mix source code, documents, charts, screenshots, and video.

The launch examples show why these capabilities matter together. M3 reproduced the core experiments of an ICLR 2025 outstanding paper in nearly 12 hours, producing 18 commits and 23 experimental figures. In another run it made 147 benchmark submissions and 1,959 tool calls over roughly 24 hours, improving an FP8 GEMM kernel by 9.4x without human intervention.

When to use it

  • Large-repository engineering: debugging, refactoring, migration, and cross-file implementation with extensive context.
  • Long-running coding agents: terminal loops, repeated validation, and performance optimization.
  • Multimodal development: turning screenshots, charts, documents, or videos into code and structured analysis.
  • Research automation: paper reproduction, experiment execution, log analysis, and iterative model training.
  • Computer-use workflows: operating desktop applications and combining visual state with tool calls.

CrossModel exposes MiniMax M3 through OpenAI-compatible /v1/chat/completions and Anthropic-compatible /v1/messages. Current pricing is available in the model catalog.