Archives AI News

Attn-QAT: 4-Bit Attention With Quantization-Aware Training

arXiv:2603.00040v1 Announce Type: new Abstract: Achieving reliable 4-bit attention is a prerequisite for end-to-end FP4 computation on emerging FP4-capable GPUs, yet attention remains the main obstacle due to FP4’s tiny dynamic range and attention’s heavy-tailed activations. This paper presents the…

CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation

arXiv:2603.00039v1 Announce Type: new Abstract: LLM-as-a-judge ensembles are the standard paradigm for scalable evaluation, but their aggregation mechanisms suffer from a fundamental flaw: they implicitly assume that judges provide independent estimates of true quality. However, in practice, LLM judges exhibit…

Breaking the Factorization Barrier in Diffusion Language Models

arXiv:2603.00045v1 Announce Type: new Abstract: Diffusion language models theoretically allow for efficient parallel generation but are practically hindered by the “factorization barrier”: the assumption that simultaneously predicted tokens are independent. This limitation forces a trade-off: models must either sacrifice speed…

PaReGTA: An LLM-based EHR Data Encoding Approach to Capture Temporal Information

arXiv:2602.19661v2 Announce Type: replace Abstract: Temporal information in structured electronic health records (EHRs) is often lost in sparse one-hot or count-based representations, while sequence models can be costly and data-hungry. We propose PaReGTA, an LLM-based encoding framework that (i) converts…