Hardware-Efficient Softmax and Layer Normalization with Guaranteed Normalization for Edge Devices

2026-04-27 19:00 GMT · 2 months ago aimagpro.com

arXiv:2604.23647v1 Announce Type: cross
Abstract: In Transformer models, non-GEMM (non-General Matrix Multiplication) operations — especially Softmax and Layer Normalization (LayerNorm) — often dominate hardware cost due to their nonlinear nature. To address this, previous approximation studies mainly target rank-oriented tasks, which is acceptable for classification. However, edge Natural Language Processing (NLP) applications and edge generative AI are largely evaluated based on score-oriented tasks, so normalization-guaranteed non-GEMM operations are essential. We propose a hardware-efficient Softmax and LayerNorm with Guaranteed Normalization for Edge devices. Our design employs hardware-efficient approximation methods while preserving the normalization (Softmax: $sum p = 1$, LayerNorm: $sigma = 1$). Our architecture is described in Verilog HDL and synthesized using the Samsung 28nm CMOS process. In accuracy evaluation, we achieve high accuracy with minimal degradation: GLUE +0.07%, SQuAD -0.01%, perplexity -0.09%. Implementation results show that our architecture is small: $942,mu m^2$ for Softmax, $1199,mu m^2$ for LayerNorm. Compared to the state of the art, we achieve up to 11x and 14x reduction in area, respectively.