Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class

The story

Zyphra releases ZAYA1-8B, a reasoning Mixture of Experts model with only 760M active parameters that outperforms open-weight models many times its size on math and coding benchmarks — closing in on DeepSeek-V3.2 and surpassing Claude 4.5 Sonnet on HMMT'25 with its novel Markovian RSA test-time compute method. Trained end-to-end on AMD Instinct MI300 hardware and released under Apache 2.0, it sets
From the source
News Hub News Hub Premium Content Read our exclusive articles Facebook Instagram X Home Open Source/Weights AI Agents Tutorials Voice AI Robotics Promote with us News Hub Home Open Source/Weights AI Agents Tutorials Voice AI Robotics Promote with us Home Artificial Intelligence AI Infrastructure Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches... Artificial Intelligence AI Infrastructure Tech News AI Paper Summary Technology AI Shorts Applications Editors Pick Language Model Large Language Model Machine Learning New Releases Open Source Software Engineering Staff Zyphra AI has released ZAYA1-8B, a small Mixture of Experts (MoE) language model with 760 million active parameters and 8.4 billion total parameters. Trained end-to-end on AMD hardware, the model outp
With under 1 billion active parameters, ZAYA1-8B achieves scores competitive with first-generation frontier reasoning models like DeepSeek-R1-0528, Gemini-2.5-Pro, and Claude 4.5 Sonnet on challenging mathematical reasoning tasks. With its novel test-time compute methodology called Markovian RSA, it surpasses Claude 4.5 Sonnet and GPT-5-High on HMMT 25 (89.6 vs 88.3) and closes in on frontier open-weight models like DeepSeek-V3.2 on mathematics benchmarks.
The distinction between active and total parameters matters a great deal. In a standard dense model, every parameter activates for every input token. In Mixture of Experts model, only a subset of the network s parameters — the experts — are activated at inference time. ZAYA1-8B has 8.4B total parameters but only 760M are active per forward pass. This dramatically reduces inference compute and memory bandwidth requirements while retaining the representational capacity of a much larger model.
Who and what
Key names and topics in this story: Zyphra Releases ZAYA1, Reasoning MoE Trained, Hardware That Punches Far, Above Its Weight Class.
Where to follow next
- Read the full piece at www.marktechpost.com
- More from our AI & prompts coverage

Related stories
How to Build a Single-Cell RNA-seq Analysis Pipeline with Scanpy for PBMC Clustering, Annotation, and Trajectory Discovery
In this tutorial, we perform an advanced single-cell RNA-seq analysis workflow using Scanpy on the PBMC-3k benchmark dataset. We start by loading the dataset, inspecting its structure, and applying quality control checks to evaluate gene counts, total counts, mitochondrial conten

EMO: Pretraining mixture of experts for emergent modularity
A Blog post by Ai2 on Hugging Face

Meet GitHub Spec-Kit: An Open Source Toolkit for Spec-Driven Development with AI Coding Agents
If you have spent time using AI coding agents — GitHub Copilot, Claude Code, Gemini CLI — you have probably run into this situation: you describe what you want, the agent generates a block of code that looks correct, compiles, and then subtly misses the actual intent. This vibe-c

Musk v. Altman week 2: OpenAI fires back, and Shivon Zilis reveals that Musk tried to poach Sam Altman
In the second week of the landmark trial between Elon Musk and OpenAI, Musk’s motivations for bringing the suit were under scrutiny. Last week, Musk took the stand, alleging that OpenAI CEO Sam Altman and president Greg Brockman had deceived him into donating $38 million to the c