Archives AI News

Low-Regret and Low-Complexity Learning for Hierarchical Inference

arXiv:2508.08985v3 Announce Type: replace Abstract: This work focuses on Hierarchical Inference (HI) in edge intelligence systems, where a compact Local-ML model on an end-device works in conjunction with a high-accuracy Remote-ML model on an edge-server. HI aims to reduce latency,…

Stable and Efficient Single-Rollout RL for Multimodal Reasoning

arXiv:2512.18215v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become a key paradigm to improve the reasoning capabilities of Multimodal Large Language Models (MLLMs). However, prevalent group-based algorithms such as GRPO require multi-rollout sampling for each prompt.…

Offline Behavioral Data Selection

arXiv:2512.18246v1 Announce Type: new Abstract: Behavioral cloning is a widely adopted approach for offline policy learning from expert demonstrations. However, the large scale of offline behavioral datasets often results in computationally intensive training when used in downstream tasks. In this…

On the Convergence Rate of LoRA Gradient Descent

arXiv:2512.18248v1 Announce Type: new Abstract: The low-rank adaptation (LoRA) algorithm for fine-tuning large models has grown popular in recent years due to its remarkable performance and low computational requirements. LoRA trains two “adapter” matrices that form a low-rank representation of…

PEDESTRIAN: An Egocentric Vision Dataset for Obstacle Detection on Pavements

arXiv:2512.19190v1 Announce Type: cross Abstract: Walking has always been a primary mode of transportation and is recognized as an essential activity for maintaining good health. Despite the need for safe walking conditions in urban environments, sidewalks are frequently obstructed by…

Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning

arXiv:2301.11321v3 Announce Type: replace Abstract: Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging. Classically, off-policy bias is corrected in a per-decision manner: past temporal-difference errors are re-weighted by…

Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies

arXiv:2507.22782v3 Announce Type: replace-cross Abstract: This paper introduces Team-Attention-Actor-Critic (TAAC), a reinforcement learning algorithm designed to enhance multi-agent collaboration in cooperative environments. TAAC employs a Centralized Training/Centralized Execution scheme incorporating multi-headed attention mechanisms in both the actor and critic. This…