Archives AI News

Robust AI Evaluation through Maximal Lotteries

arXiv:2602.21297v1 Announce Type: new Abstract: The standard way to evaluate language models on subjective tasks is through pairwise comparisons: an annotator chooses the “better” of two responses to a prompt. Leaderboards aggregate these comparisons into a single Bradley-Terry (BT) ranking,…

Urban Vibrancy Embedding and Application on Traffic Prediction

arXiv:2602.21232v1 Announce Type: new Abstract: Urban vibrancy reflects the dynamic human activity within urban spaces and is often measured using mobile data that captures floating population trends. This study proposes a novel approach to derive Urban Vibrancy embeddings from real-time…

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

arXiv:2602.20156v3 Announce Type: replace-cross Abstract: LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM applications with specialized third-party code, knowledge, and instructions. Although this can extend…

Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data

arXiv:2602.21320v1 Announce Type: new Abstract: Large language models (LLMs) are becoming the foundation for autonomous agents that can use tools to solve complex tasks. Reinforcement learning (RL) has emerged as a common approach for injecting such agentic capabilities, but typically…

Parallel Split Learning with Global Sampling

arXiv:2407.15738v5 Announce Type: replace Abstract: Distributed deep learning in resource-constrained environments faces scalability and generalization challenges due to large effective batch sizes and non-identically distributed client data. We introduce a server-driven sampling strategy that maintains a fixed global batch size…