Archives AI News

Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

arXiv:2510.27044v1 Announce Type: new Abstract: Mathematical reasoning is a central challenge for large language models (LLMs), requiring not only correct answers but also faithful reasoning processes. Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising approach for enhancing…

Efficient Attention via Pre-Scoring: Prioritizing Informative Keys in Transformers

arXiv:2505.11040v3 Announce Type: replace Abstract: Recent advances in transformer architectures deeply enhanced long-context language modeling. Among them, HyperAttention achieves competitive efficiency by combining a single-level LSH-based clustering with uniform residual sampling. However, HyperAttention fails to find all significant keys, which…

Consistency Training Helps Stop Sycophancy and Jailbreaks

arXiv:2510.27062v1 Announce Type: new Abstract: An LLM’s factuality and refusal training can be compromised by simple changes to a prompt. Models often adopt user beliefs (sycophancy) or satisfy inappropriate requests which are wrapped within special text (jailbreaking). We explore emph{consistency…

Kernel conditional tests from learning-theoretic bounds

arXiv:2506.03898v2 Announce Type: replace Abstract: We propose a framework for hypothesis testing on conditional probability distributions, which we then use to construct statistical tests of functionals of conditional distributions. These tests identify the inputs where the functionals differ with high…

Towards a Measure of Algorithm Similarity

arXiv:2510.27063v1 Announce Type: new Abstract: Given two algorithms for the same problem, can we determine whether they are meaningfully different? In full generality, the question is uncomputable, and empirically it is muddied by competing notions of similarity. Yet, in many…

MLPerf Automotive

arXiv:2510.27065v1 Announce Type: new Abstract: We present MLPerf Automotive, the first standardized public benchmark for evaluating Machine Learning systems that are deployed for AI acceleration in automotive systems. Developed through a collaborative partnership between MLCommons and the Autonomous Vehicle Computing…

Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion

arXiv:2510.04525v2 Announce Type: replace Abstract: Masked diffusion models have shown promising performance in generating high-quality samples in a wide range of domains, but accelerating their sampling process remains relatively underexplored. To investigate efficient samplers for masked diffusion, this paper theoretically…

Towards Understanding Self-play for LLM Reasoning

arXiv:2510.27072v1 Announce Type: new Abstract: Recent advances in large language model (LLM) reasoning, led by reinforcement learning with verifiable rewards (RLVR), have inspired self-play post-training, where models improve by generating and solving their own problems. While self-play has shown strong…