In this blog post, we explore the kernel design details presented in the paper Fast and Simplex: 2-Simplicial Attention in Triton [1]. We begin by modeling the 2-Simplicial attention algorithm…
Original: https://pytorch.org/blog/fast-2-simplicial-attention-hardware-efficient-kernels-in-tlx/
