Archives AI News

Exploring Time-Step Size in Reinforcement Learning for Sepsis Treatment

arXiv:2511.20913v1 Announce Type: new Abstract: Existing studies on reinforcement learning (RL) for sepsis management have mostly followed an established problem setup, in which patient data are aggregated into 4-hour time steps. Although concerns have been raised regarding the coarseness of…

Single- vs. Dual-Policy Reinforcement Learning for Dynamic Bike Rebalancing

arXiv:2402.03589v2 Announce Type: replace Abstract: Bike-sharing systems (BSS) provide a sustainable urban mobility solution, but ensuring their reliability requires effective rebalancing strategies to address stochastic demand and prevent station imbalances. This paper proposes reinforcement learning (RL) algorithms for dynamic rebalancing…

Operationalizing Quantized Disentanglement

arXiv:2511.20927v1 Announce Type: new Abstract: Recent theoretical work established the unsupervised identifiability of quantized factors under any diffeomorphism. The theory assumes that quantization thresholds correspond to axis-aligned discontinuities in the probability density of the latent factors. By constraining a learned…

No Request Left Behind: Tackling Heterogeneity in Long-Context LLM Inference with Medha

arXiv:2409.17264v5 Announce Type: replace Abstract: Deploying million-token Large Language Models (LLMs) is challenging because production workloads are highly heterogeneous, mixing short queries and long documents. This heterogeneity, combined with the quadratic complexity of attention, creates severe convoy effects where long-running…

A Gray-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse

arXiv:2408.10901v4 Announce Type: replace-cross Abstract: Recent advancements in Latent Diffusion Models (LDMs) have revolutionized image synthesis and manipulation, raising significant concerns about data misappropriation and intellectual property infringement. While adversarial attacks have been extensively explored as a protective measure against…

QiMeng-SALV: Signal-Aware Learning for Verilog Code Generation

arXiv:2510.19296v3 Announce Type: replace Abstract: The remarkable progress of Large Language Models (LLMs) presents promising opportunities for Verilog code generation which is significantly important for automated circuit design. The lacking of meaningful functional rewards hinders the preference optimization based on…

scipy.spatial.transform: Differentiable Framework-Agnostic 3D Transformations in Python

arXiv:2511.18157v2 Announce Type: replace Abstract: Three-dimensional rigid-body transforms, i.e. rotations and translations, are central to modern differentiable machine learning pipelines in robotics, vision, and simulation. However, numerically robust and mathematically correct implementations, particularly on SO(3), are error-prone due to issues…

Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

arXiv:2506.14988v4 Announce Type: replace Abstract: We propose a multi-agent multi-armed bandit (MA-MAB) framework aimed at ensuring fair outcomes across agents while maximizing overall system performance. A key challenge in this setting is decision-making under limited information about arm rewards. To…