Generating State-of-the-Art GEMMs with TorchInductor’s CuteDSL backend
Introduction TorchInductor currently supports three autotuning backends for matrix multiplications: Triton, CUTLASS (C++), and cuBLAS. This post describes the integration of CuteDSL as a fourth backend, the technical motivation for…
