Archives AI News

A Unifying View of Linear Function Approximation in Off-Policy RL Through Matrix Splitting and Preconditioning

arXiv:2501.01774v3 Announce Type: replace Abstract: In off-policy policy evaluation (OPE) tasks within reinforcement learning, Temporal Difference Learning(TD) and Fitted Q-Iteration (FQI) have traditionally been viewed as differing in the number of updates toward the target value function: TD makes one…

November 27, 2025

Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

arXiv:2506.14988v4 Announce Type: replace Abstract: We propose a multi-agent multi-armed bandit (MA-MAB) framework aimed at ensuring fair outcomes across agents while maximizing overall system performance. A key challenge in this setting is decision-making under limited information about arm rewards. To…

November 27, 2025

Differentiable Physics-Neural Models enable Learning of Non-Markovian Closures for Accelerated Coarse-Grained Physics Simulations

arXiv:2511.21369v1 Announce Type: cross Abstract: Numerical simulations provide key insights into many physical, real-world problems. However, while these simulations are solved on a full 3D domain, most analysis only require a reduced set of metrics (e.g. plane-level concentrations). This work…

November 27, 2025

TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos

arXiv:2511.21690v1 Announce Type: cross Abstract: Learning new robot tasks on new platforms and in new scenes from only a handful of demonstrations remains challenging. While videos of other embodiments – humans and different robots – are abundant, differences in embodiment,…

November 27, 2025

Gradient Descent Algorithm Survey

arXiv:2511.20725v1 Announce Type: new Abstract: Focusing on the practical configuration needs of optimization algorithms in deep learning, this article concentrates on five major algorithms: SGD, Mini-batch SGD, Momentum, Adam, and Lion. It systematically analyzes the core advantages, limitations, and key…

November 27, 2025

Learning from Risk: LLM-Guided Generation of Safety-Critical Scenarios with Prior Knowledge

arXiv:2511.20726v1 Announce Type: new Abstract: Autonomous driving faces critical challenges in rare long-tail events and complex multi-agent interactions, which are scarce in real-world data yet essential for robust safety validation. This paper presents a high-fidelity scenario generation framework that integrates…

November 27, 2025

Active Slice Discovery in Large Language Models

arXiv:2511.20713v1 Announce Type: new Abstract: Large Language Models (LLMs) often exhibit systematic errors on specific subsets of data, known as error slices. For instance, a slice can correspond to a certain demographic, where a model does poorly in identifying toxic…

November 27, 2025

ST-PPO: Stabilized Off-Policy Proximal Policy Optimization for Multi-Turn Agents Training

arXiv:2511.20718v1 Announce Type: new Abstract: PPO has been widely adopted for training large language models (LLMs) at the token level in multi-turn dialogue and reasoning tasks. However, its performance is often unstable and prone to collapse. Through empirical analysis, we…

November 27, 2025

Solving Diffusion Inverse Problems with Restart Posterior Sampling

arXiv:2511.20705v1 Announce Type: new Abstract: Inverse problems are fundamental to science and engineering, where the goal is to infer an underlying signal or state from incomplete or noisy measurements. Recent approaches employ diffusion models as powerful implicit priors for such…

November 27, 2025

Pretraining Transformer-Based Models on Diffusion-Generated Synthetic Graphs for Alzheimer’s Disease Prediction

arXiv:2511.20704v1 Announce Type: new Abstract: Early and accurate detection of Alzheimer’s disease (AD) is crucial for enabling timely intervention and improving outcomes. However, developing reliable machine learning (ML) models for AD diagnosis is challenging due to limited labeled data, multi-site…

November 27, 2025