Archives AI News

DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression

arXiv:2603.22324v1 Announce Type: new Abstract: We introduce Delta-Aware Quantization (DAQ), a data-free post-training quantization framework that preserves the knowledge acquired during post-training. Standard quantization objectives minimize reconstruction error but are agnostic to the base model, allowing quantization noise to disproportionately…

Hybrid Associative Memories

arXiv:2603.22325v1 Announce Type: new Abstract: Recurrent neural networks (RNNs) and self-attention are both widely used sequence-mixing layers that maintain an internal memory. However, this memory is constructed using two orthogonal mechanisms: RNNs compress the entire past into a fixed-size state,…