A 1M-context flagship built for long-horizon tasks
IndexShare shares one indexer across every four sparse-attention layers, cutting per-token FLOPs by 2.9× at 1M context.
Overview
GLM-5.2 is Zhipu AI's flagship model for long-horizon tasks, released on June 17, 2026 under the MIT license. It keeps the GLM-5 MoE lineage — 753B total parameters with ~40B active — but makes a step change in context: GLM-5.2 stably sustains a 1M-token working context, up from GLM-5.1's 200K.
The headline architectural change is IndexShare: every four sparse-attention layers share a single lightweight indexer placed on the first layer, and the top-k indices are reused across the other three. Combined with KVShare and a refined MTP layer, this cuts per-token FLOPs by 2.9× at 1M context and improves speculative-decoding acceptance length by up to 20%, so the long window stays affordable to serve.
Key capabilities
| Dimension | Detail |
|---|---|
| Context window | 1,000,000 tokens (1M) |
| Max output | 128,000 tokens |
| Input modalities | Text |
| Output modalities | Text |
| Tools | streaming, JSON output, tool calls, High / Max effort levels |
GLM-5.2 exposes High and Max thinking effort levels, letting you trade model capability against latency and compute cost per request. Max spends more internal reasoning for the hardest engineering and math work; High is the faster default for interactive coding. See live pricing in the model catalog.
Benchmarks
GLM-5.2's evaluation axis is long-horizon engineering: resolving real repository issues and driving terminals over many steps, not single-turn prompts.
Closing in on closed-source flagships on long-horizon coding
Numbers from the official launch blog; detail lines show comparison models.
On SWE-bench Pro, GLM-5.2 scores 62.1, ahead of GPT-5.5 (58.6) and its own predecessor GLM-5.1 (58.4). On FrontierSWE it reaches 74.4, edging past GPT-5.5 (72.6) and finishing in a near-tie with Claude Opus 4.8 (75.1). Terminal Bench 2.1 (Terminus-2) climbs to 81.0, a large jump from GLM-5.1's 63.5, and ProgramBench rises to 63.7 from 50.9 — the clearest signal that the gains are about sustained, tool-driven execution rather than one-shot code.
Agents, tools, and reasoning
Tool use and math reasoning rise together
Numbers from the official launch blog; detail lines show comparison models.
On the MCP-Atlas tool-usage public set, GLM-5.2 scores 76.8, ahead of GPT-5.5 (75.3) and just behind Claude Opus 4.8 (77.8); Tool-Decathlon improves to 48.2 from GLM-5.1's 40.7. Reasoning rises in lockstep: 99.2 on AIME 2026 and 91.2 on GPQA-Diamond, both well above GLM-5.1 (95.3 / 86.2). The pattern across coding, agents, and math is consistent — GLM-5.2 narrows the gap to the leading closed models while remaining open-weight.
When to use it
- Million-token codebases: whole-repo reading, cross-file refactors, and migrations that overflow a 200K window.
- Long-horizon agents: multi-step tool chains where MCP-Atlas / Tool-Decathlon stability matters more than single-turn quality.
- Hard reasoning and math: competition-level problems and research-style analysis where Max effort pays off.
- Open-weight deployment: teams that need MIT-licensed weights they can self-host and fine-tune.
CrossModel exposes GLM-5.2 through an OpenAI-compatible /v1/chat/completions API. Current pricing is available in the model catalog.