Archives AI News

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

arXiv:2602.16953v1 Announce Type: cross Abstract: Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback is often expensive and slow to obtain, making online reinforcement learning (RL) impractical. High-coverage hardware verification exemplifies this challenge due…

Attending to Routers Aids Indoor Wireless Localization

arXiv:2602.16762v1 Announce Type: new Abstract: Modern machine learning-based wireless localization using Wi-Fi signals continues to face significant challenges in achieving groundbreaking performance across diverse environments. A major limitation is that most existing algorithms do not appropriately weight the information from…

Omitted Variable Bias in Language Models Under Distribution Shift

arXiv:2602.16784v1 Announce Type: new Abstract: Despite their impressive performance on a wide variety of tasks, modern language models remain susceptible to distribution shifts, exhibiting brittle behavior when evaluated on data that differs in distribution from their training data. In this…

Defining and Evaluating Physical Safety for Large Language Models

arXiv:2411.02317v2 Announce Type: replace Abstract: Large Language Models (LLMs) are increasingly used to control robotic systems such as drones, but their risks of causing physical threats and harm in real-world applications remain unexplored. Our study addresses the critical gap in…

Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning

arXiv:2509.09135v3 Announce Type: replace Abstract: Existing reinforcement learning (RL) methods struggle with complex dynamical systems that demand interactions at high frequencies or irregular time intervals. Continuous-time RL (CTRL) has emerged as a promising alternative by replacing discrete-time Bellman recursion with…

Escaping the Cognitive Well: Efficient Competition Math with Off-the-Shelf Models

arXiv:2602.16793v1 Announce Type: new Abstract: In the past year, custom and unreleased math reasoning models reached gold medal performance on the International Mathematical Olympiad (IMO). Similar performance was then reported using large-scale inference on publicly available models but at prohibitive…