Archives AI News

Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning

arXiv:2301.11321v3 Announce Type: replace Abstract: Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging. Classically, off-policy bias is corrected in a per-decision manner: past temporal-difference errors are re-weighted by…

Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies

arXiv:2507.22782v3 Announce Type: replace-cross Abstract: This paper introduces Team-Attention-Actor-Critic (TAAC), a reinforcement learning algorithm designed to enhance multi-agent collaboration in cooperative environments. TAAC employs a Centralized Training/Centralized Execution scheme incorporating multi-headed attention mechanisms in both the actor and critic. This…

Multiperiodic Processes: Ergodic Sources with a Sublinear Entropy

arXiv:2302.09049v3 Announce Type: replace-cross Abstract: We construct multiperiodic processes — a simple example of stationary ergodic (but not mixing) processes over natural numbers that enjoy the vanishing entropy rate under a mild condition. Multiperiodic processes are supported on randomly shifted…