GLM-5.2 · Model guide

GLM-5.2 · Zhipu AI

A 1M-context flagship built for long-horizon tasks

Parameters

753B

~40B active MoE

Context window

1,000,000 tokens (GLM-5.1 was 200K)

SWE-bench Pro

62.1

beats GPT-5.5 (58.6)

IndexShare

2.9×

FLOPs cut at 1M context

Long-horizon engineering

SWE-bench Pro / Terminal Bench / FrontierSWE

Agents & tool use

MCP-Atlas, Tool-Decathlon

Reasoning & math

AIME 2026, GPQA-Diamond

IndexShare shares one indexer across every four sparse-attention layers, cutting per-token FLOPs by 2.9× at 1M context.

Overview

GLM-5.2 is Zhipu AI's flagship model for long-horizon tasks, released on June 17, 2026 under the MIT license. It keeps the GLM-5 MoE lineage — 753B total parameters with ~40B active — but makes a step change in context: GLM-5.2 stably sustains a 1M-token working context, up from GLM-5.1's 200K.

The headline architectural change is IndexShare: every four sparse-attention layers share a single lightweight indexer placed on the first layer, and the top-k indices are reused across the other three. Combined with KVShare and a refined MTP layer, this cuts per-token FLOPs by 2.9× at 1M context and improves speculative-decoding acceptance length by up to 20%, so the long window stays affordable to serve.

Key capabilities

Dimension	Detail
Context window	1,000,000 tokens (1M)
Max output	128,000 tokens
Input modalities	Text
Output modalities	Text
Tools	streaming, JSON output, tool calls, High / Max effort levels

GLM-5.2 exposes High and Max thinking effort levels, letting you trade model capability against latency and compute cost per request. Max spends more internal reasoning for the hardest engineering and math work; High is the faster default for interactive coding. See live pricing in the model catalog.

Benchmarks

GLM-5.2's evaluation axis is long-horizon engineering: resolving real repository issues and driving terminals over many steps, not single-turn prompts.

Coding & Terminal

Closing in on closed-source flagships on long-horizon coding

SWE-bench Pro

62.1

GLM-5.1 58.4 · GPT-5.5 58.6

FrontierSWE

74.4

Opus 4.8 75.1 · GPT-5.5 72.6

Terminal Bench 2.1

81.0

Terminus-2, GLM-5.1 63.5

ProgramBench

63.7

GLM-5.1 50.9

Numbers from the official launch blog; detail lines show comparison models.

On SWE-bench Pro, GLM-5.2 scores 62.1, ahead of GPT-5.5 (58.6) and its own predecessor GLM-5.1 (58.4). On FrontierSWE it reaches 74.4, edging past GPT-5.5 (72.6) and finishing in a near-tie with Claude Opus 4.8 (75.1). Terminal Bench 2.1 (Terminus-2) climbs to 81.0, a large jump from GLM-5.1's 63.5, and ProgramBench rises to 63.7 from 50.9 — the clearest signal that the gains are about sustained, tool-driven execution rather than one-shot code.

Agents, tools, and reasoning

Agentic & Reasoning

Tool use and math reasoning rise together

MCP-Atlas

76.8

Public Set, GPT-5.5 75.3

Tool-Decathlon

48.2

GLM-5.1 40.7

AIME 2026

99.2

GPT-5.5 98.3

GPQA-Diamond

91.2

GLM-5.1 86.2

Numbers from the official launch blog; detail lines show comparison models.

On the MCP-Atlas tool-usage public set, GLM-5.2 scores 76.8, ahead of GPT-5.5 (75.3) and just behind Claude Opus 4.8 (77.8); Tool-Decathlon improves to 48.2 from GLM-5.1's 40.7. Reasoning rises in lockstep: 99.2 on AIME 2026 and 91.2 on GPQA-Diamond, both well above GLM-5.1 (95.3 / 86.2). The pattern across coding, agents, and math is consistent — GLM-5.2 narrows the gap to the leading closed models while remaining open-weight.

When to use it

Million-token codebases: whole-repo reading, cross-file refactors, and migrations that overflow a 200K window.
Long-horizon agents: multi-step tool chains where MCP-Atlas / Tool-Decathlon stability matters more than single-turn quality.
Hard reasoning and math: competition-level problems and research-style analysis where Max effort pays off.
Open-weight deployment: teams that need MIT-licensed weights they can self-host and fine-tune.

CrossModel exposes GLM-5.2 through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.