Archives AI News

A Multi-Fidelity Control Variate Approach for Policy Gradient Estimation

arXiv:2503.05696v3 Announce Type: replace Abstract: Many reinforcement learning (RL) algorithms are impractical for deployment in operational systems or for training with computationally expensive high-fidelity simulations, as they require large amounts of data. Meanwhile, low-fidelity simulators — such as reduced-order models,…

Beyond Imitation: Recovering Dense Rewards from Demonstrations

arXiv:2510.02493v1 Announce Type: new Abstract: Conventionally, supervised fine-tuning (SFT) is treated as a simple imitation learning process that only trains a policy to imitate expert behavior on demonstration datasets. In this work, we challenge this view by establishing a fundamental…

Risk-Sensitive Agent Compositions

arXiv:2506.04632v2 Announce Type: replace Abstract: From software development to robot control, modern agentic systems decompose complex objectives into a sequence of subtasks and choose a set of specialized AI agents to complete them. We formalize agentic workflows as directed acyclic…

Graph Generation with Spectral Geodesic Flow Matching

arXiv:2510.02520v1 Announce Type: new Abstract: Graph generation is a fundamental task with wide applications in modeling complex systems. Although existing methods align the spectrum or degree profile of the target graph, they often ignore the geometry induced by eigenvectors and…

KAIROS: Unified Training for Universal Non-Autoregressive Time Series Forecasting

arXiv:2510.02084v2 Announce Type: replace Abstract: In the World Wide Web, reliable time series forecasts provide the forward-looking signals that drive resource planning, cache placement, and anomaly response, enabling platforms to operate efficiently as user behavior and content distributions evolve. Compared…

Model-brain comparison using inter-animal transforms

arXiv:2510.02523v1 Announce Type: new Abstract: Artificial neural network models have emerged as promising mechanistic models of the brain. However, there is little consensus on the correct method for comparing model activations to brain responses. Drawing on recent work in philosophy…

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

arXiv:2503.04697v2 Announce Type: replace-cross Abstract: Reasoning language models have shown an uncanny ability to improve performance at test-time by “thinking longer”-that is, by generating longer chain-of-thought sequences and hence using more compute. However, the length of their chain-of-thought reasoning is…