Archives AI News

Estimating the Self-Consistency of LLMs

arXiv:2509.19489v1 Announce Type: new Abstract: Systems often repeat the same prompt to large language models (LLMs) and aggregate responses to improve reliability. This short note analyzes an estimator of the self-consistency of LLMs and the tradeoffs it induces under a…

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning

arXiv:2509.19736v1 Announce Type: new Abstract: Reinforcement learning (RL) has shown promise in training agentic models that move beyond static benchmarks to engage in dynamic, multi-turn interactions. Yet, the ultimate value of such agents lies in their ability to assist users,…

White-Basilisk: A Hybrid Model for Code Vulnerability Detection

arXiv:2507.08540v3 Announce Type: replace-cross Abstract: The proliferation of software vulnerabilities presents a significant challenge to cybersecurity, necessitating more effective detection methodologies. We introduce White-Basilisk, a novel approach to vulnerability detection that demonstrates superior performance while challenging prevailing assumptions in AI…

The Conductor and the Engine: A Path Towards Co-Designed Reasoning

arXiv:2509.19762v1 Announce Type: new Abstract: Modern LLM reasoning relies on extensive test-time computation, driven by internal model training and external agentic orchestration. However, this synergy is often inefficient, as model verbosity and poor instruction following lead to wasted compute. We…

Evaluation-Aware Reinforcement Learning

arXiv:2509.19464v1 Announce Type: new Abstract: Policy evaluation is often a prerequisite for deploying safety- and performance-critical systems. Existing evaluation approaches frequently suffer from high variance due to limited data and long-horizon tasks, or high bias due to unequal support or…