Mamba Modulation: On the Length Generalization of Mamba
arXiv:2509.19633v1 Announce Type: cross Abstract: The quadratic complexity of the attention mechanism in Transformer models has motivated the development of alternative architectures with sub-quadratic scaling, such as state-space models. Among these, Mamba has emerged as a leading architecture, achieving state-of-the-art…
