Archives AI News

A Novel Convolution and Attention Mechanism-based Model for 6D Object Pose Estimation

arXiv:2501.01993v2 Announce Type: replace-cross Abstract: This paper proposes PoseLecTr, a graph-based encoder-decoder framework that integrates a novel Legendre convolution with attention mechanisms for six-degree-of-freedom (6-DOF) object pose estimation from monocular RGB images. Conventional learning-based approaches predominantly rely on grid-structured convolutions,…

January 8, 2026

Sensor to Pixels: Decentralized Swarm Gathering via Image-Based Reinforcement Learning

arXiv:2601.03413v1 Announce Type: new Abstract: This study highlights the potential of image-based reinforcement learning methods for addressing swarm-related tasks. In multi-agent reinforcement learning, effective policy learning depends on how agents sense, interpret, and process inputs. Traditional approaches often rely on…

January 8, 2026

A Comparative Analysis of Contextual Representation Flow in State-Space and Transformer Architectures

arXiv:2510.06640v2 Announce Type: replace-cross Abstract: State Space Models (SSMs) have recently emerged as efficient alternatives to Transformer-Based Models (TBMs) for long-sequence processing with linear scaling, yet how contextual information flows across layers in these architectures remains understudied. We present the…

January 8, 2026

Jailbreaking LLMs Without Gradients or Priors: Effective and Transferable Attacks

arXiv:2601.03420v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly deployed in safety-critical domains, rigorously evaluating their robustness against adversarial jailbreaks is essential. However, current safety evaluations often overestimate robustness because existing automated attacks are limited by restrictive…

January 8, 2026

Compact Example-Based Explanations for Language Models

arXiv:2601.03786v1 Announce Type: cross Abstract: Training data influence estimation methods quantify the contribution of training documents to a model’s output, making them a promising source of information for example-based explanations. As humans cannot interpret thousands of documents, only a small…

January 8, 2026

Spectral Archaeology: The Causal Topology of Model Evolution

arXiv:2601.03424v1 Announce Type: new Abstract: Behavioral benchmarks tell us textit{what} a model does, but not textit{how}. We introduce a training-free mechanistic probe using attention-graph spectra. Treating each layer as a token graph, we compute algebraic connectivity ($lambda_2$), smoothness, and spectral…

January 8, 2026

Current Agents Fail to Leverage World Model as Tool for Foresight

arXiv:2601.03905v1 Announce Type: cross Abstract: Agents built on vision-language models increasingly face tasks that demand anticipating future states rather than relying on short-horizon reasoning. Generative world models offer a promising remedy: agents could use them as external simulators to foresee…

January 8, 2026

The Illusion of Specialization: Unveiling the Domain-Invariant “Standing Committee” in Mixture-of-Experts Models

arXiv:2601.03425v1 Announce Type: new Abstract: Mixture of Experts models are widely assumed to achieve domain specialization through sparse routing. In this work, we question this assumption by introducing COMMITTEEAUDIT, a post hoc framework that analyzes routing behavior at the level…

January 8, 2026

A Single-Loop Bilevel Deep Learning Method for Optimal Control of Obstacle Problems

arXiv:2601.04120v1 Announce Type: cross Abstract: Optimal control of obstacle problems arises in a wide range of applications and is computationally challenging due to its nonsmoothness, nonlinearity, and bilevel structure. Classical numerical approaches rely on mesh-based discretization and typically require solving…

January 8, 2026

VNU-Bench: A Benchmarking Dataset for Multi-Source Multimodal News Video Understanding

arXiv:2601.03434v1 Announce Type: new Abstract: News videos are carefully edited multimodal narratives that combine narration, visuals, and external quotations into coherent storylines. In recent years, there have been significant advances in evaluating multimodal large language models (MLLMs) for news video…

January 8, 2026