Archives AI News

One-Shot Multi-Label Causal Discovery in High-Dimensional Event Sequences

arXiv:2509.23213v2 Announce Type: replace-cross Abstract: Understanding causality in event sequences with thousands of sparse event types is critical in domains such as healthcare, cybersecurity, or vehicle diagnostics, yet current methods fail to scale. We present OSCAR, a one-shot causal autoregressive…

CTRL-ALT-DECEIT: Sabotage Evaluations for Automated AI R&D

arXiv:2511.09904v1 Announce Type: new Abstract: AI systems are increasingly able to autonomously conduct realistic software engineering tasks, and may soon be deployed to automate machine learning (ML) R&D itself. Frontier AI systems may be deployed in safety-critical settings, including to…

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

arXiv:2511.07250v2 Announce Type: replace-cross Abstract: The advent of Multimodal Large Language Models (MLLMs) has expanded AI capabilities to visual modalities, yet existing evaluation benchmarks remain limited to single-video understanding, overlooking the critical need for multi-video understanding in real-world scenarios (e.g.,…

RoboBenchMart: Benchmarking Robots in Retail Environment

arXiv:2511.10276v1 Announce Type: cross Abstract: Most existing robotic manipulation benchmarks focus on simplified tabletop scenarios, typically involving a stationary robotic arm interacting with various objects on a flat surface. To address this limitation, we introduce RoboBenchMart, a more challenging and…

Understanding Human-AI Trust in Education

arXiv:2506.09160v4 Announce Type: replace-cross Abstract: As AI chatbots become integrated in education, students are turning to these systems for guidance, feedback, and information. However, the anthropomorphic characteristics of these chatbots create ambiguity over whether students develop trust in them in…