Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class

TN
2 min readSource: www.marktechpost.com
Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class
Share

The story

Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class

Zyphra releases ZAYA1-8B, a reasoning Mixture of Experts model with only 760M active parameters that outperforms open-weight models many times its size on math and coding benchmarks — closing in on DeepSeek-V3.2 and surpassing Claude 4.5 Sonnet on HMMT'25 with its novel Markovian RSA test-time compute method. Trained end-to-end on AMD Instinct MI300 hardware and released under Apache 2.0, it sets

From the source

News Hub News Hub Premium Content Read our exclusive articles Facebook Instagram X Home Open Source/Weights AI Agents Tutorials Voice AI Robotics Promote with us News Hub Home Open Source/Weights AI Agents Tutorials Voice AI Robotics Promote with us Home Artificial Intelligence AI Infrastructure Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches... Artificial Intelligence AI Infrastructure Tech News AI Paper Summary Technology AI Shorts Applications Editors Pick Language Model Large Language Model Machine Learning New Releases Open Source Software Engineering Staff Zyphra AI has released ZAYA1-8B, a small Mixture of Experts (MoE) language model with 760 million active parameters and 8.4 billion total parameters. Trained end-to-end on AMD hardware, the model outp

With under 1 billion active parameters, ZAYA1-8B achieves scores competitive with first-generation frontier reasoning models like DeepSeek-R1-0528, Gemini-2.5-Pro, and Claude 4.5 Sonnet on challenging mathematical reasoning tasks. With its novel test-time compute methodology called Markovian RSA, it surpasses Claude 4.5 Sonnet and GPT-5-High on HMMT 25 (89.6 vs 88.3) and closes in on frontier open-weight models like DeepSeek-V3.2 on mathematics benchmarks.

The distinction between active and total parameters matters a great deal. In a standard dense model, every parameter activates for every input token. In Mixture of Experts model, only a subset of the network s parameters — the experts — are activated at inference time. ZAYA1-8B has 8.4B total parameters but only 760M are active per forward pass. This dramatically reduces inference compute and memory bandwidth requirements while retaining the representational capacity of a much larger model.

Who and what

Key names and topics in this story: Zyphra Releases ZAYA1, Reasoning MoE Trained, Hardware That Punches Far, Above Its Weight Class.

Where to follow next

Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class
#ai#zyphra-releases-zaya1#reasoning-moe-trained#hardware-that-punches-far#above-its-weight-class
Share

Related stories

Comments open soon — join the discussion.