EMO: Pretraining mixture of experts for emergent modularity

TN
1 min readSource: huggingface.co
EMO: Pretraining mixture of experts for emergent modularity
Share

The story

EMO: Pretraining mixture of experts for emergent modularity

EMO: Pretraining mixture of experts for emergent modularity

From the source

Back to Articles EMO: Pretraining mixture of experts for emergent modularity Team Article Published May 8, 2026 Upvote 15 +9 Kyle Wiggers Ai2Comms Follow allenai Ryan Wang ryanyxw Follow allenai 🧠 Models: https://huggingface.co/collections/allenai/emo | 📄 Tech report: https://allenai.org/papers/emo | 💻 Code: https://github.com/allenai/EMO | 📊 Visualization: https://emovisualization.netlify.app/

Today we're releasing EMO , a new mixture-of-experts (MoE) model pretrained end-to-end so that modular structure emerges directly from the data without relying on human-defined priors. EMO lets you use a small subset of its experts - just 12.5% of the total - for a given task while keeping near full-model performance, and still works as a strong general-purpose model when all experts are used together.

Large language models are typically trained and deployed as monolithic systems: a single model is initialized, pretrained, fine-tuned, and served as one unified entity. But applications often need only a subset of capabilities, such as code generation, mathematical reasoning, or domain-specific knowledge. As frontier language models routinely reach trillions of parameters, using and adapting the full model becomes impractical for most users and incurs unnecessary computational cost and memory to host parameters that may not even be needed.

Who and what

Key names and topics in this story: Pretraining.

Where to follow next

EMO: Pretraining mixture of experts for emergent modularity
#ai#pretraining
Share

Related stories

Comments open soon — join the discussion.