A tiered-reasoning MoE model built for high-concurrency product use
`reasoning_effort` supports no_think / low / high, letting teams trade response speed for deeper reasoning.
Overview
Hy3 Preview, the preview release of Tencent Hunyuan's Hy3 family, opened weights on April 23, 2026, followed by Tencent's announcement on April 24. It is the first major model Tencent released after rebuilding its pretraining and reinforcement-learning infrastructure, and Tencent describes it as the smartest Hy model so far, with clear gains in reasoning, instruction following, in-context learning, coding, and agent behavior.
The model uses a 295B total / 21B active MoE architecture with 192 experts and top-8 activation, plus a 3.8B MTP layer for speculative decoding. It is already integrated into Tencent products such as Yuanbao, ima, CodeBuddy, WorkBuddy, QQ, QQ Browser, and Tencent Docs, and is available through Tencent Cloud TokenHub. Tencent reports 47% lower end-to-end latency, 54% lower first-token latency, and request success above 99.99% in internal product testing.
Key capabilities
| Dimension | Detail |
|---|---|
| Context window | 256,000 tokens |
| Max input | 192,000 tokens |
| Max output | 128,000 tokens |
| Input modalities | Text |
| Output modalities | Text |
| Architecture | 295B total / 21B active MoE (192 experts, top-8) + 3.8B MTP |
| Tools | deep thinking, function calling, JSON output, streaming, cache, MCP |
Hy3 exposes
reasoning_effortwithno_think,low, andhigh. TokenHub uses input-length tiers around 16K and 32K tokens and supports cached input tiers. See live pricing in the model catalog.
Architecture and efficiency
Hy3 Preview is a dense-MoE hybrid decoder-only model: the first layer uses dense FFN, later MoE layers route each token to 8 experts, and the architecture uses sigmoid routing, QK-Norm, and GQA (64 attention heads / 8 KV heads). The MTP layer supports speculative decoding, and Tencent reports about 40% inference-efficiency improvement over Hy2.
Benchmarks
Hy3's evaluation story spans STEM reasoning, long context and instruction following, coding, and agents. The figures show high / low reasoning-effort results.
STEM and reasoning
Strong math and structured reasoning for its class
Numbers show high / low reasoning effort; contest and STEM tasks are Hy3 Preview strengths.
Hy3 reaches 87.2 / 80.9 on GPQA-Diamond, 84.3 / 74.9 on IMO Answer Bench, and 70.0 / 63.3 on FrontierScience Olympiad. It is especially strong on Chinese competition-style math, including 88.4 / 66.5 on the Tsinghua Qiuzhen exam and 87.8 / 82.9 on CHSBO 2025.
Context, instruction following, coding, and agents
Stable long-context retrieval and complex instruction following
Numbers show high / low effort; higher effort is especially useful on Chinese long-horizon tasks.
Hy3 scores 66.3 / 56.0 on AA-LCR, 65.4 / 56.4 on LongBench v2, and 79.5 / 72.4 on AdvancedIF.
The biggest upgrade over Hy2
Parenthetical context is Hy2; coding, terminal, and search capabilities all make a generational jump.
Compared with Hy2, coding and agent results jump sharply: SWE-bench Verified rises from 53.0% to 74.4%, Terminal-Bench 2.0 from 23.2% to 54.4%, BrowseComp from 28.7% to 67.1%, and WideSearch from 53.9% to 70.2%. Tencent also reports stable agent workflows up to 495 steps.
When to use it
- Agent workflows: long multi-tool automation with MCP orchestration.
- Coding assistants: code reading, editing, debugging, and cross-file repair.
- Hard reasoning: math, STEM, and contest-style problems with
higheffort. - Long-document processing: 256K context, 192K max input, and cache tiers.
- Chinese product integrations: Tencent's own products provide useful deployment proof points.
CrossModel exposes Hy3 Preview through OpenAI-compatible /v1/chat/completions and Anthropic-compatible /v1/messages. Current pricing is available in the model catalog.