Archives AI News

One-Shot Multi-Label Causal Discovery in High-Dimensional Event Sequences

arXiv:2509.23213v2 Announce Type: replace-cross Abstract: Understanding causality in event sequences with thousands of sparse event types is critical in domains such as healthcare, cybersecurity, or vehicle diagnostics, yet current methods fail to scale. We present OSCAR, a one-shot causal autoregressive…

CTRL-ALT-DECEIT: Sabotage Evaluations for Automated AI R&D

arXiv:2511.09904v1 Announce Type: new Abstract: AI systems are increasingly able to autonomously conduct realistic software engineering tasks, and may soon be deployed to automate machine learning (ML) R&D itself. Frontier AI systems may be deployed in safety-critical settings, including to…

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

arXiv:2511.07250v2 Announce Type: replace-cross Abstract: The advent of Multimodal Large Language Models (MLLMs) has expanded AI capabilities to visual modalities, yet existing evaluation benchmarks remain limited to single-video understanding, overlooking the critical need for multi-video understanding in real-world scenarios (e.g.,…

RoboBenchMart: Benchmarking Robots in Retail Environment

arXiv:2511.10276v1 Announce Type: cross Abstract: Most existing robotic manipulation benchmarks focus on simplified tabletop scenarios, typically involving a stationary robotic arm interacting with various objects on a flat surface. To address this limitation, we introduce RoboBenchMart, a more challenging and…

OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive

arXiv:2511.09914v1 Announce Type: new Abstract: The opioid crisis represents a significant moment in public health that reveals systemic shortcomings across regulatory systems, healthcare practices, corporate governance, and public policy. Analyzing how these interconnected systems simultaneously failed to protect public health…

Simulating Misinformation Propagation in Social Networks using Large Language Models

arXiv:2511.10384v1 Announce Type: cross Abstract: Misinformation on social media thrives on surprise, emotion, and identity-driven reasoning, often amplified through human cognitive biases. To investigate these mechanisms, we model large language model (LLM) personas as synthetic agents that mimic user-level biases,…

Adaptive Hyperbolic Kernels: Modulated Embedding in de Branges-Rovnyak Spaces

arXiv:2511.09921v1 Announce Type: new Abstract: Hierarchical data pervades diverse machine learning applications, including natural language processing, computer vision, and social network analysis. Hyperbolic space, characterized by its negative curvature, has demonstrated strong potential in such tasks due to its capacity…

Reasoning About Intent for Ambiguous Requests

arXiv:2511.10453v1 Announce Type: cross Abstract: Large language models often respond to ambiguous requests by implicitly committing to one interpretation. Intent misunderstandings can frustrate users and create safety risks. To address this, we propose generating multiple interpretation-answer pairs in a single…