Archives AI News

Half the Nonlinearity Is Wasted: Measuring and Reallocating the Transformer’s MLP Budget

arXiv:2603.03459v1 Announce Type: new Abstract: We investigate when transformer MLP nonlinearity is actually necessary. A gate with $d+1$ parameters decides when to replace the full MLP with a linear surrogate. Through systematic investigation across six models (162M-2.8B parameters), two architectures,…

March 5, 2026

Tracing 3D Anatomy in 2D Strokes: A Multi-Stage Projection Driven Approach to Cervical Spine Fracture Identification

arXiv:2601.15235v3 Announce Type: replace-cross Abstract: Cervical spine fractures demand rapid and accurate diagnosis for effective clinical management. This study presents an automated, end-to-end pipeline for fracture detection across cervical vertebrae (C1–C7) that assesses the feasibility of fracture recognition from vertebra-level…

March 5, 2026

A Short Note on a Variant of the Squint Algorithm

arXiv:2603.03409v1 Announce Type: new Abstract: This short note describes a simple variant of the Squint algorithm of Koolen and Van Erven [2015] for the classic expert problem. Via an equally simple modification of their proof, we prove that this variant…

March 5, 2026

[Re] FairDICE: A Gap Between Theory And Practice

arXiv:2603.03454v1 Announce Type: new Abstract: Offline Reinforcement Learning (RL) is an emerging field of RL in which policies are learned solely from demonstrations. Within offline RL, some environments involve balancing multiple objectives, but existing multi-objective offline RL algorithms do not…

March 5, 2026

Heterogeneous Time Constants Improve Stability in Equilibrium Propagation

arXiv:2603.03402v1 Announce Type: new Abstract: Equilibrium propagation (EP) is a biologically plausible alternative to backpropagation for training neural networks. However, existing EP models use a uniform scalar time step dt, which corresponds biologically to a membrane time constant that is…

March 5, 2026

Towards Improved Sentence Representations using Token Graphs

arXiv:2603.03389v1 Announce Type: new Abstract: Obtaining a single-vector representation from a Large Language Model’s (LLM) token-level outputs is a critical step for nearly all sentence-level tasks. However, standard pooling methods like mean or max aggregation treat tokens as an independent…

March 5, 2026

RADAR: Learning to Route with Asymmetry-aware DistAnce Representations

arXiv:2603.03388v1 Announce Type: new Abstract: Recent neural solvers have achieved strong performance on vehicle routing problems (VRPs), yet they mainly assume symmetric Euclidean distances, restricting applicability to real-world scenarios. A core challenge is encoding the relational features in asymmetric distance…

March 5, 2026

Generating Fine Details of Entity Interactions

arXiv:2504.08714v2 Announce Type: replace-cross Abstract: Recent text-to-image models excel at generating high-quality object-centric images from instructions. However, images should also encapsulate rich interactions between objects, where existing models often fall short, likely due to limited training data and benchmarks for…

March 5, 2026

Graph Hopfield Networks: Energy-Based Node Classification with Associative Memory

arXiv:2603.03464v1 Announce Type: new Abstract: We introduce Graph Hopfield Networks, whose energy function couples associative memory retrieval with graph Laplacian smoothing for node classification. Gradient descent on this joint energy yields an iterative update interleaving Hopfield retrieval with Laplacian propagation.…

March 5, 2026

On the Generalization Limits of Quantum Generative Adversarial Networks with Pure State Generators

arXiv:2508.09844v2 Announce Type: replace-cross Abstract: We investigate the capabilities of Quantum Generative Adversarial Networks (QGANs) in image generations tasks. Our analysis centers on fully quantum implementations of both the generator and discriminator. Through extensive numerical testing of current main architectures,…

March 5, 2026