MoM: Linear Sequence Modeling with Mixture-of-Memories
arXiv:2502.13685v4 Announce Type: replace-cross Abstract: Linear sequence modeling methods, such as linear attention, state space modeling, and linear RNNs, offer significant efficiency improvements by reducing the complexity of training and inference. However, these methods typically compress the entire input sequence…
