dMX: Differentiable Mixed-Precision Assignment for Low-Precision Floating-Point Formats
arXiv:2606.04115v1 Announce Type: new Abstract: Quantizing large language models (LLMs) to low-precision floating-point representations is central to efficient deployment, yet applying a single bit-width uniformly across all layers is sub-optimal in terms of both performance and accuracy. This work introduces…
