<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Model Efficiency on Koby Bibas</title><link>https://kobybibas.github.io/tags/model-efficiency/</link><description>Recent content in Model Efficiency on Koby Bibas</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Sat, 23 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://kobybibas.github.io/tags/model-efficiency/index.xml" rel="self" type="application/rss+xml"/><item><title>[Summary] UniPool: Treating MoE Experts as a Global Budget</title><link>https://kobybibas.github.io/posts/20260523_unipool_global_expert_pool/summary/</link><pubDate>Sat, 23 May 2026 00:00:00 +0000</pubDate><guid>https://kobybibas.github.io/posts/20260523_unipool_global_expert_pool/summary/</guid><description>TL;DR Modern MoE Transformers usually give each MoE layer its own private expert pool, so expert parameters grow roughly with the number of MoE layers. UniPool addresses this wasteful layer-local allocation by letting different layers route into a shared global expert pool, reducing expert parameters by up to 60% while maintaining similar performance.
Vanilla MoE: Every Layer Gets Its Own Experts In a traditional MoE Transformer, each MoE layer has its own isolated set of experts.</description></item></channel></rss>