From Loop Nests to Silicon: Mapping AI Workloads onto AMD NPUs with MLIR-AIR
arXiv:2510.14871v1 Announce Type: cross Abstract: General-purpose compilers abstract away parallelism, locality, and synchronization, limiting their effectiveness on modern spatial architectures. As modern computing architectures increasingly rely on fine-grained control over data movement, execution order, and compute placement for performance, compiler…
