CrossModel
Back to model catalog

Qwen · Model guide

Qwen3.7 Max

qwen/qwen3.7-max
Modalities
TextText
Context
1M
Max output
66K
Qwen3.7 Max

The Qwen flagship for hardest reasoning, coding, and long-running agents

Context window
1M
tokens
Max output
65.53K
tokens
Thinking budget
256K
tokens
Complex coding
Large repos, debugging, architecture planning, and code agents
Office work
Long documents, reports, contracts, and cross-source synthesis
Long execution
Planning, tool use, state retention, and iterative convergence

The public interface is currently text-only; use Qwen3.6 Plus or Flash for multimodal workloads.

Overview

Qwen3.7 Max is the largest and most capable model in the Qwen3.7 family. Qwen Cloud positions it as the flagship for the agent-centric era: hardest reasoning, complex coding, office productivity, and long-running autonomous execution. In CrossModel, qwen/qwen3.7-max is the Qwen tier to choose when the task has real decision cost and enough context to justify a flagship model.

The important constraint is modality. The public Qwen3.7 Max interface is currently text-only: text in, text out. It is not the right first choice for screenshot OCR, video understanding, or visual grounding; those belong to Qwen3.6 Plus or Flash. Max is better used as the final reasoning layer over documents, code, logs, requirements, and tool results.

Key capabilities

DimensionDetail
Context window1M tokens
Max input991.80K tokens
Max output65.53K tokens
Thinking budget256K tokens
Input modalitiesText
Output modalitiesText
Toolsfunction calling, structured outputs, built-in tools, cache

Qwen3.7 Max supports implicit cache, explicit cache creation/read, and session cache. Current pricing is available in the model catalog; this article intentionally avoids fixed price numbers.

Agent and tool work

Tools & Cache

Long context plus tool use, not just static answers

Built-in tools
3
web search / code interpreter / web extractor
Cache modes
3
implicit / explicit / session
Max input
991.80K
tokens

Built-in tools are exposed through the Responses API; regular function calling still uses schemas you define.

Qwen3.7 Max combines 1M context with thinking mode and tool access. The model page lists function calling, cache, structured outputs, and web search; the built-in tools section lists code_interpreter, web_extractor, and web_search through the Responses API. That makes Max suitable for agent loops that must read a large working set, plan several steps, call tools, and still produce an auditable final answer.

The 256K thinking budget is the main difference from the Qwen3.6 line. It gives Max more room for hard planning and internal deliberation before emitting a 65.53K-token answer, which matters for architecture reviews, migration plans, legal cross-references, and codebase-level debugging.

Positioning among Qwen models

Workflow Fit

Keep the hardest decisions on Max and fan out the parallel work

Qwen3.7 Max
Reason
Final reasoning and complex decisions
Qwen3.6 Plus
Balance
Everyday flagship and multimodal lead
Qwen3.6 Flash
Throughput
High-volume execution and low-cost drafts
01
Gather context
Code, requirements, logs, docs, and tool results
02
Split work
Send extraction, search, and drafts to Plus / Flash
03
Max decides
Handle high-risk reasoning, architecture, and final plans
04
Verify output
Close the loop with tests, schemas, or review

This routing pattern keeps repetitive extraction, preprocessing, and drafting off the flagship tier.

Do not put every request on Max. A practical route is to use Qwen3.6 Flash for batch extraction and first drafts, Qwen3.6 Plus for multimodal review and balanced production work, then reserve Qwen3.7 Max for the final judgment: ambiguous requirements, cross-file architecture, tool-result synthesis, and high-risk decisions.

This split is especially useful in code agents. Flash can search and summarize, Plus can handle screenshots or richer review, and Max can decide what to change, explain why, and produce a plan that survives human review.

When to use it

  • Hard reasoning and coding: architecture planning, repository-scale debugging, migrations, and long-form implementation plans.
  • Long-context synthesis: product specs, logs, docs, tickets, and tool results that need one coherent answer.
  • Agent orchestration: function calling plus built-in tools where the model must reason across intermediate state.
  • Final review layer: escalation target for difficult samples from Qwen3.6 Flash or Plus.

CrossModel exposes Qwen3.7 Max through an OpenAI-compatible API. Current pricing is available in the model catalog.