Archives AI News

Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

arXiv:2510.27044v1 Announce Type: new Abstract: Mathematical reasoning is a central challenge for large language models (LLMs), requiring not only correct answers but also faithful reasoning processes. Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising approach for enhancing…

November 3, 2025

Efficient Attention via Pre-Scoring: Prioritizing Informative Keys in Transformers

arXiv:2505.11040v3 Announce Type: replace Abstract: Recent advances in transformer architectures deeply enhanced long-context language modeling. Among them, HyperAttention achieves competitive efficiency by combining a single-level LSH-based clustering with uniform residual sampling. However, HyperAttention fails to find all significant keys, which…

November 3, 2025

Consistency Training Helps Stop Sycophancy and Jailbreaks

arXiv:2510.27062v1 Announce Type: new Abstract: An LLM’s factuality and refusal training can be compromised by simple changes to a prompt. Models often adopt user beliefs (sycophancy) or satisfy inappropriate requests which are wrapped within special text (jailbreaking). We explore emph{consistency…

November 3, 2025

Kernel conditional tests from learning-theoretic bounds

arXiv:2506.03898v2 Announce Type: replace Abstract: We propose a framework for hypothesis testing on conditional probability distributions, which we then use to construct statistical tests of functionals of conditional distributions. These tests identify the inputs where the functionals differ with high…

November 3, 2025

Towards a Measure of Algorithm Similarity

arXiv:2510.27063v1 Announce Type: new Abstract: Given two algorithms for the same problem, can we determine whether they are meaningfully different? In full generality, the question is uncomputable, and empirically it is muddied by competing notions of similarity. Yet, in many…

November 3, 2025

Deep Learning-based Prediction of Clinical Trial Enrollment with Uncertainty Estimates

arXiv:2507.23607v2 Announce Type: replace Abstract: Clinical trials are a systematic endeavor to assess the safety and efficacy of new drugs or treatments. Conducting such trials typically demands significant financial investment and meticulous planning, highlighting the need for accurate predictions of…

November 3, 2025

MLPerf Automotive

arXiv:2510.27065v1 Announce Type: new Abstract: We present MLPerf Automotive, the first standardized public benchmark for evaluating Machine Learning systems that are deployed for AI acceleration in automotive systems. Developed through a collaborative partnership between MLCommons and the Autonomous Vehicle Computing…

November 3, 2025

Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion

arXiv:2510.04525v2 Announce Type: replace Abstract: Masked diffusion models have shown promising performance in generating high-quality samples in a wide range of domains, but accelerating their sampling process remains relatively underexplored. To investigate efficient samplers for masked diffusion, this paper theoretically…

November 3, 2025

Towards Understanding Self-play for LLM Reasoning

arXiv:2510.27072v1 Announce Type: new Abstract: Recent advances in large language model (LLM) reasoning, led by reinforcement learning with verifiable rewards (RLVR), have inspired self-play post-training, where models improve by generating and solving their own problems. While self-play has shown strong…

November 3, 2025

Accelerating Data Generation for Nonlinear temporal PDEs via homologous perturbation in solution space

arXiv:2510.21592v2 Announce Type: replace Abstract: Data-driven deep learning methods like neural operators have advanced in solving nonlinear temporal partial differential equations (PDEs). However, these methods require large quantities of solution pairsu2014the solution functions and right-hand sides (RHS) of the equations.…

November 3, 2025