Hy3 Preview · Model guide

Tencent Hunyuan Hy3 Preview

A tiered-reasoning MoE model built for high-concurrency product use

Context window

256,000

tokens

Max output

128,000

tokens

Parameter scale

295B

21B active · MoE 192 experts

Hard reasoning

Math, STEM, and contest-style tasks with high effort

Coding & agents

Large SWE, Terminal, and retrieval gains with MCP orchestration

Long context

256K window plus cached-input tiers for retrieval and synthesis

`reasoning_effort` supports no_think / low / high, letting teams trade response speed for deeper reasoning.

Overview

Hy3 Preview, the preview release of Tencent Hunyuan's Hy3 family, opened weights on April 23, 2026, followed by Tencent's announcement on April 24. It is the first major model Tencent released after rebuilding its pretraining and reinforcement-learning infrastructure, and Tencent describes it as the smartest Hy model so far, with clear gains in reasoning, instruction following, in-context learning, coding, and agent behavior.

The model uses a 295B total / 21B active MoE architecture with 192 experts and top-8 activation, plus a 3.8B MTP layer for speculative decoding. It is already integrated into Tencent products such as Yuanbao, ima, CodeBuddy, WorkBuddy, QQ, QQ Browser, and Tencent Docs, and is available through Tencent Cloud TokenHub. Tencent reports 47% lower end-to-end latency, 54% lower first-token latency, and request success above 99.99% in internal product testing.

Key capabilities

Dimension	Detail
Context window	256,000 tokens
Max input	192,000 tokens
Max output	128,000 tokens
Input modalities	Text
Output modalities	Text
Architecture	295B total / 21B active MoE (192 experts, top-8) + 3.8B MTP
Tools	deep thinking, function calling, JSON output, streaming, cache, MCP

Hy3 exposes reasoning_effort with no_think, low, and high. TokenHub uses input-length tiers around 16K and 32K tokens and supports cached input tiers. See live pricing in the model catalog.

Architecture and efficiency

Hy3 Preview is a dense-MoE hybrid decoder-only model: the first layer uses dense FFN, later MoE layers route each token to 8 experts, and the architecture uses sigmoid routing, QK-Norm, and GQA (64 attention heads / 8 KV heads). The MTP layer supports speculative decoding, and Tencent reports about 40% inference-efficiency improvement over Hy2.

Benchmarks

Hy3's evaluation story spans STEM reasoning, long context and instruction following, coding, and agents. The figures show high / low reasoning-effort results.

STEM and reasoning

STEM & Reasoning

Strong math and structured reasoning for its class

GPQA-Diamond

87.2 / 80.9

Kimi-K2.5 87.6 · GPT-5.4 92.8

IMO Answer Bench

84.3 / 74.9

GLM-5 82.5 · Gemini-3.1 89.2

Qiuzhen exam avg@3

88.4 / 66.5

GLM-5 81.5 · Kimi-K2.5 77.7

CHSBO 2025

87.8 / 82.9

GPT-5.4 85.4 · Kimi-K2.5 70.7

Numbers show high / low reasoning effort; contest and STEM tasks are Hy3 Preview strengths.

Hy3 reaches 87.2 / 80.9 on GPQA-Diamond, 84.3 / 74.9 on IMO Answer Bench, and 70.0 / 63.3 on FrontierScience Olympiad. It is especially strong on Chinese competition-style math, including 88.4 / 66.5 on the Tsinghua Qiuzhen exam and 87.8 / 82.9 on CHSBO 2025.

Context, instruction following, coding, and agents

Long Context & Instruction Following

Stable long-context retrieval and complex instruction following

AA-LCR

66.3 / 56.0

Kimi-K2.5 65.3 · Gemini-3.1 72.7

LongBench v2

65.4 / 56.4

Kimi-K2.5 65.6 · GPT-5.4 67.4

AdvancedIF

79.5 / 72.4

Kimi-K2.5 78.5 · GPT-5.4 83.4

CL-bench Life

15.7 / 8.5

GLM-5/Kimi 13.0 · GPT-5.4 19.2

Numbers show high / low effort; higher effort is especially useful on Chinese long-horizon tasks.

Hy3 scores 66.3 / 56.0 on AA-LCR, 65.4 / 56.4 on LongBench v2, and 79.5 / 72.4 on AdvancedIF.

Coding & Agents

The biggest upgrade over Hy2

SWE-bench Verified

74.4%

Hy2 53.0%

Terminal-Bench 2.0

54.4%

Hy2 23.2%

BrowseComp

67.1%

Hy2 28.7%

WideSearch

70.2%

Hy2 53.9%

Parenthetical context is Hy2; coding, terminal, and search capabilities all make a generational jump.

Compared with Hy2, coding and agent results jump sharply: SWE-bench Verified rises from 53.0% to 74.4%, Terminal-Bench 2.0 from 23.2% to 54.4%, BrowseComp from 28.7% to 67.1%, and WideSearch from 53.9% to 70.2%. Tencent also reports stable agent workflows up to 495 steps.

When to use it

Agent workflows: long multi-tool automation with MCP orchestration.
Coding assistants: code reading, editing, debugging, and cross-file repair.
Hard reasoning: math, STEM, and contest-style problems with high effort.
Long-document processing: 256K context, 192K max input, and cache tiers.
Chinese product integrations: Tencent's own products provide useful deployment proof points.

CrossModel exposes Hy3 Preview through OpenAI-compatible /v1/chat/completions and Anthropic-compatible /v1/messages. Current pricing is available in the model catalog.