Archives AI News

ST-PPO: Stabilized Off-Policy Proximal Policy Optimization for Multi-Turn Agents Training

arXiv:2511.20718v1 Announce Type: new Abstract: PPO has been widely adopted for training large language models (LLMs) at the token level in multi-turn dialogue and reasoning tasks. However, its performance is often unstable and prone to collapse. Through empirical analysis, we…

November 27, 2025

A Gray-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse

arXiv:2408.10901v4 Announce Type: replace-cross Abstract: Recent advancements in Latent Diffusion Models (LDMs) have revolutionized image synthesis and manipulation, raising significant concerns about data misappropriation and intellectual property infringement. While adversarial attacks have been extensively explored as a protective measure against…

November 27, 2025

QiMeng-SALV: Signal-Aware Learning for Verilog Code Generation

arXiv:2510.19296v3 Announce Type: replace Abstract: The remarkable progress of Large Language Models (LLMs) presents promising opportunities for Verilog code generation which is significantly important for automated circuit design. The lacking of meaningful functional rewards hinders the preference optimization based on…

November 27, 2025

scipy.spatial.transform: Differentiable Framework-Agnostic 3D Transformations in Python

arXiv:2511.18157v2 Announce Type: replace Abstract: Three-dimensional rigid-body transforms, i.e. rotations and translations, are central to modern differentiable machine learning pipelines in robotics, vision, and simulation. However, numerically robust and mathematically correct implementations, particularly on SO(3), are error-prone due to issues…

November 27, 2025

A Unifying View of Linear Function Approximation in Off-Policy RL Through Matrix Splitting and Preconditioning

arXiv:2501.01774v3 Announce Type: replace Abstract: In off-policy policy evaluation (OPE) tasks within reinforcement learning, Temporal Difference Learning(TD) and Fitted Q-Iteration (FQI) have traditionally been viewed as differing in the number of updates toward the target value function: TD makes one…

November 27, 2025

Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

arXiv:2506.14988v4 Announce Type: replace Abstract: We propose a multi-agent multi-armed bandit (MA-MAB) framework aimed at ensuring fair outcomes across agents while maximizing overall system performance. A key challenge in this setting is decision-making under limited information about arm rewards. To…

November 27, 2025

Differentiable Physics-Neural Models enable Learning of Non-Markovian Closures for Accelerated Coarse-Grained Physics Simulations

arXiv:2511.21369v1 Announce Type: cross Abstract: Numerical simulations provide key insights into many physical, real-world problems. However, while these simulations are solved on a full 3D domain, most analysis only require a reduced set of metrics (e.g. plane-level concentrations). This work…

November 27, 2025

TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos

arXiv:2511.21690v1 Announce Type: cross Abstract: Learning new robot tasks on new platforms and in new scenes from only a handful of demonstrations remains challenging. While videos of other embodiments – humans and different robots – are abundant, differences in embodiment,…

November 27, 2025

Gradient Descent Algorithm Survey

arXiv:2511.20725v1 Announce Type: new Abstract: Focusing on the practical configuration needs of optimization algorithms in deep learning, this article concentrates on five major algorithms: SGD, Mini-batch SGD, Momentum, Adam, and Lion. It systematically analyzes the core advantages, limitations, and key…

November 27, 2025

Learning from Risk: LLM-Guided Generation of Safety-Critical Scenarios with Prior Knowledge

arXiv:2511.20726v1 Announce Type: new Abstract: Autonomous driving faces critical challenges in rare long-tail events and complex multi-agent interactions, which are scarce in real-world data yet essential for robust safety validation. This paper presents a high-fidelity scenario generation framework that integrates…

November 27, 2025