Model Efficiency on Koby Bibas

Model Efficiency on Koby Bibashttps://kobybibas.github.io/tags/model-efficiency/Recent content in Model Efficiency on Koby BibasHugo -- gohugo.ioenSat, 23 May 2026 00:00:00 +0000[Summary] UniPool: Treating MoE Experts as a Global Budgethttps://kobybibas.github.io/posts/20260523_unipool_global_expert_pool/summary/Sat, 23 May 2026 00:00:00 +0000https://kobybibas.github.io/posts/20260523_unipool_global_expert_pool/summary/TL;DR Modern MoE Transformers usually give each MoE layer its own private expert pool, so expert parameters grow roughly with the number of MoE layers. UniPool addresses this wasteful layer-local allocation by letting different layers route into a shared global expert pool, reducing expert parameters by up to 60% while maintaining similar performance. Vanilla MoE: Every Layer Gets Its Own Experts In a traditional MoE Transformer, each MoE layer has its own isolated set of experts.