Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond
arXiv:2410.12982v2 Announce Type: replace Abstract: While transformers have been at the core of most recent advancements in sequence generative models, their computational cost remains quadratic in sequence length. Several subquadratic architectures have been proposed to address this computational issue. Some…
