Archives AI News

Disentangling Recall and Reasoning in Transformer Models through Layer-wise Attention and Activation Analysis

arXiv:2510.03366v2 Announce Type: replace Abstract: Transformer-based language models excel at both recall (retrieving memorized facts) and reasoning (performing multi-step inference), but whether these abilities rely on distinct internal mechanisms remains unclear. Distinguishing recall from reasoning is crucial for predicting model…

March 16, 2026

On the Geometric Coherence of Global Aggregation in Federated Graph Neural Networks

arXiv:2602.15510v2 Announce Type: replace Abstract: Federated Learning (FL) enables distributed training across multiple clients without centralized data sharing, while Graph Neural Networks (GNNs) model relational data through message passing. In federated GNN settings, client graphs often exhibit heterogeneous structural and…

March 16, 2026

SortScrews: A Dataset and Baseline for Real-time Screw Classification

arXiv:2603.13027v1 Announce Type: cross Abstract: Automatic identification of screw types is important for industrial automation, robotics, and inventory management. However, publicly available datasets for screw classification are scarce, particularly for controlled single-object scenarios commonly encountered in automated sorting systems. In…

March 16, 2026

Dual Filter: A Transformer-like Inference Architecture for Hidden Markov Models

arXiv:2505.00818v2 Announce Type: replace Abstract: This paper presents a mathematical framework for causal nonlinear prediction in settings where observations are generated from an underlying hidden Markov model (HMM). Both the problem formulation and the proposed solution are motivated by the…

March 16, 2026

Thermodynamics of Reinforcement Learning Curricula

arXiv:2603.12324v1 Announce Type: new Abstract: Connections between statistical mechanics and machine learning have repeatedly proven fruitful, providing insight into optimization, generalization, and representation learning. In this work, we follow this tradition by leveraging results from non-equilibrium thermodynamics to formalize curriculum…

March 16, 2026

Maximum Entropy Exploration Without the Rollouts

arXiv:2603.12325v1 Announce Type: new Abstract: Efficient exploration remains a central challenge in reinforcement learning, serving as a useful pretraining objective for data collection, particularly when an external reward function is unavailable. A principled formulation of the exploration problem is to…

March 16, 2026

A Geometrically-Grounded Drive for MDL-Based Optimization in Deep Learning

arXiv:2603.12304v1 Announce Type: new Abstract: This paper introduces a novel optimization framework that fundamentally integrates the Minimum Description Length (MDL) principle into the training dynamics of deep neural networks. Moving beyond its conventional role as a model selection criterion, we…

March 16, 2026

HCP-DCNet: A Hierarchical Causal Primitive Dynamic Composition Network for Self-Improving Causal Understanding

arXiv:2603.12305v1 Announce Type: new Abstract: The ability to understand and reason about cause and effect — encompassing interventions, counterfactuals, and underlying mechanisms — is a cornerstone of robust artificial intelligence. While deep learning excels at pattern recognition, it fundamentally lacks…

March 16, 2026

Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency

arXiv:2603.12298v1 Announce Type: new Abstract: Activation engineering enables precise control over Large Language Models (LLMs) without the computational cost of fine-tuning. However, existing methods deriving vectors from static activation differences are susceptible to high-dimensional noise and layer-wise semantic drift, often…

March 16, 2026

Synthetic Data Generation for Brain-Computer Interfaces: Overview, Benchmarking, and Future Directions

arXiv:2603.12296v1 Announce Type: new Abstract: Deep learning has achieved transformative performance across diverse domains, largely driven by the large-scale, high-quality training data. In contrast, the development of brain-computer interfaces (BCIs) is fundamentally constrained by the limited, heterogeneous, and privacy-sensitive neural…

March 16, 2026