Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers
arXiv:2603.11114v1 Announce Type: new Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation…
