Archives AI News

Attn-QAT: 4-Bit Attention With Quantization-Aware Training

arXiv:2603.00040v1 Announce Type: new Abstract: Achieving reliable 4-bit attention is a prerequisite for end-to-end FP4 computation on emerging FP4-capable GPUs, yet attention remains the main obstacle due to FP4’s tiny dynamic range and attention’s heavy-tailed activations. This paper presents the…

March 3, 2026

CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation

arXiv:2603.00039v1 Announce Type: new Abstract: LLM-as-a-judge ensembles are the standard paradigm for scalable evaluation, but their aggregation mechanisms suffer from a fundamental flaw: they implicitly assume that judges provide independent estimates of true quality. However, in practice, LLM judges exhibit…

March 3, 2026

Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning

arXiv:2510.00819v2 Announce Type: replace Abstract: Reinforcement Learning, particularly through policy gradient methods, has played a central role in enabling reasoning capabilities of Large Language Models. However, the optimization stability of policy gradients in this setting remains understudied. As a result,…

March 3, 2026

Breaking the Factorization Barrier in Diffusion Language Models

arXiv:2603.00045v1 Announce Type: new Abstract: Diffusion language models theoretically allow for efficient parallel generation but are practically hindered by the “factorization barrier”: the assumption that simultaneously predicted tokens are independent. This limitation forces a trade-off: models must either sacrifice speed…

March 3, 2026

Accelerating Data Generation for Nonlinear temporal PDEs via homologous perturbation in solution space

arXiv:2510.21592v3 Announce Type: replace Abstract: Data-driven deep learning methods like neural operators have advanced in solving nonlinear temporal partial differential equations (PDEs). However, these methods require large quantities of solution pairsu2014the solution functions and right-hand sides (RHS) of the equations.…

March 3, 2026

REMIND: Rethinking Medical High-Modality Learning under Missingness–A Long-Tailed Distribution Perspective

arXiv:2603.00046v1 Announce Type: new Abstract: Medical multi-modal learning is critical for integrating information from a large set of diverse modalities. However, when leveraging a high number of modalities in real clinical applications, it is often impractical to obtain full-modality observations…

March 3, 2026

PaReGTA: An LLM-based EHR Data Encoding Approach to Capture Temporal Information

arXiv:2602.19661v2 Announce Type: replace Abstract: Temporal information in structured electronic health records (EHRs) is often lost in sparse one-hot or count-based representations, while sequence models can be costly and data-hungry. We propose PaReGTA, an LLM-based encoding framework that (i) converts…

March 3, 2026

BiJEPA: Bi-directional Joint Embedding Predictive Architecture for Symmetric Representation Learning

arXiv:2603.00049v1 Announce Type: new Abstract: Self-Supervised Learning (SSL) has shifted from pixel-level reconstruction to latent space prediction, spearheaded by the Joint Embedding Predictive Architecture (JEPA). While effective, standard JEPA models typically rely on a uni-directional prediction mechanism (e.g. Context $to$…

March 3, 2026

Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution

arXiv:2507.06547v3 Announce Type: replace-cross Abstract: While diffusion models excel at image generation, their growing adoption raises critical concerns about copyright issues and model transparency. Existing attribution methods identify training examples influencing an entire image, but fall short in isolating contributions…

March 3, 2026

Knowledge-guided generative surrogate modeling for high-dimensional design optimization under scarce data

arXiv:2603.00052v1 Announce Type: new Abstract: Surrogate models are widely used in mechanical design and manufacturing process optimization, where high-fidelity computational models may be unavailable or prohibitively expensive. Their effectiveness, however, is often limited by data scarcity, as purely data-driven surrogates…

March 3, 2026