Archives AI News

One-Shot Multi-Label Causal Discovery in High-Dimensional Event Sequences

arXiv:2509.23213v2 Announce Type: replace-cross Abstract: Understanding causality in event sequences with thousands of sparse event types is critical in domains such as healthcare, cybersecurity, or vehicle diagnostics, yet current methods fail to scale. We present OSCAR, a one-shot causal autoregressive…

November 15, 2025

CTRL-ALT-DECEIT: Sabotage Evaluations for Automated AI R&D

arXiv:2511.09904v1 Announce Type: new Abstract: AI systems are increasingly able to autonomously conduct realistic software engineering tasks, and may soon be deployed to automate machine learning (ML) R&D itself. Frontier AI systems may be deployed in safety-critical settings, including to…

November 15, 2025

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

arXiv:2511.07250v2 Announce Type: replace-cross Abstract: The advent of Multimodal Large Language Models (MLLMs) has expanded AI capabilities to visual modalities, yet existing evaluation benchmarks remain limited to single-video understanding, overlooking the critical need for multi-video understanding in real-world scenarios (e.g.,…

November 15, 2025

Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis for Large Reasoning Models

arXiv:2511.09907v1 Announce Type: new Abstract: Data synthesis for training large reasoning models offers a scalable alternative to limited, human-curated datasets, enabling the creation of high-quality data. However, existing approaches face several challenges: (i) indiscriminate generation that ignores the solver’s ability…

November 15, 2025

RoboBenchMart: Benchmarking Robots in Retail Environment

arXiv:2511.10276v1 Announce Type: cross Abstract: Most existing robotic manipulation benchmarks focus on simplified tabletop scenarios, typically involving a stationary robotic arm interacting with various objects on a flat surface. To address this limitation, we introduce RoboBenchMart, a more challenging and…

November 15, 2025

OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive

arXiv:2511.09914v1 Announce Type: new Abstract: The opioid crisis represents a significant moment in public health that reveals systemic shortcomings across regulatory systems, healthcare practices, corporate governance, and public policy. Analyzing how these interconnected systems simultaneously failed to protect public health…

November 15, 2025

Simulating Misinformation Propagation in Social Networks using Large Language Models

arXiv:2511.10384v1 Announce Type: cross Abstract: Misinformation on social media thrives on surprise, emotion, and identity-driven reasoning, often amplified through human cognitive biases. To investigate these mechanisms, we model large language model (LLM) personas as synthetic agents that mimic user-level biases,…

November 15, 2025

Adaptive Hyperbolic Kernels: Modulated Embedding in de Branges-Rovnyak Spaces

arXiv:2511.09921v1 Announce Type: new Abstract: Hierarchical data pervades diverse machine learning applications, including natural language processing, computer vision, and social network analysis. Hyperbolic space, characterized by its negative curvature, has demonstrated strong potential in such tasks due to its capacity…

November 15, 2025

Reasoning About Intent for Ambiguous Requests

arXiv:2511.10453v1 Announce Type: cross Abstract: Large language models often respond to ambiguous requests by implicitly committing to one interpretation. Intent misunderstandings can frustrate users and create safety risks. To address this, we propose generating multiple interpretation-answer pairs in a single…

November 15, 2025

SPAN: Benchmarking and Improving Cross-Calendar Temporal Reasoning of Large Language Models

arXiv:2511.09993v1 Announce Type: new Abstract: We introduce SPAN, a cross-calendar temporal reasoning benchmark, which requires LLMs to perform intra-calendar temporal reasoning and inter-calendar temporal conversion. SPAN features ten cross-calendar temporal reasoning directions, two reasoning types, and two question formats across…

November 15, 2025