Archives AI News

Amortized Vine Copulas for High-Dimensional Density and Information Estimation

arXiv:2604.20568v2 Announce Type: replace Abstract: Modeling high-dimensional dependencies while keeping likelihoods tractable remains challenging. Classical vine-copula pipelines are interpretable but can be expensive, while many neural estimators are flexible but less structured. In this work, we propose Vine Denoising Copula…

May 8, 2026

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

arXiv:2605.05225v1 Announce Type: new Abstract: Mixture-of-Experts Multimodal Large Language Models (MoE MLLMs) suffer from a significant efficiency bottleneck during Expert Parallelism (EP) inference due to the straggler effect. This issue is worsened in the multimodal context, as existing token-count-based load…

May 8, 2026

High entropy leads to symmetry equivariant policies in Dec-POMDPs

arXiv:2511.22581v4 Announce Type: replace Abstract: We prove that in any Dec-POMDP, sufficiently high entropy regularization ensures that the policy gradient flow with tabular softmax parametrization always converges, for any initialization, to the same joint policy, and that this joint policy…

May 8, 2026

DINORANKCLIP: DINOv3 Distillation and Injection for Vision-Language Pretraining with High-Order Ranking Consistency

arXiv:2605.06592v1 Announce Type: cross Abstract: Contrastive language-image pretraining (CLIP) suffers from two structural weaknesses: the symmetric InfoNCE loss discards the relative ordering among unmatched in-batch pairs, and global pooling collapses the visual representation into a semantic bottleneck that is poorly…

May 8, 2026

Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend

arXiv:2605.06055v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) inference requires large-scale token exchange across devices, making dispatch and combine major bottlenecks in both prefill and decode. Beyond network transfer, routing-driven layout transformation, temporary relay, and output restoration can add substantial overhead.…

May 8, 2026

Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning

arXiv:2605.05226v1 Announce Type: new Abstract: The central challenge of reinforcement learning for reasoning lies not only in the sparsity of outcome-level supervision, but more fundamentally in how to transform feedback provided only at the end of a sequence into fine-grained…

May 8, 2026

Independent Learning of Nash Equilibria in Partially Observable Markov Potential Games with Decoupled Dynamics

arXiv:2605.06377v1 Announce Type: cross Abstract: We study Nash equilibrium learning in partially observable Markov games (POMGs), a multi-agent reinforcement learning framework in which agents cannot fully observe the underlying state. Prior work in this setting relies on centralization or information…

May 8, 2026

Graph Normalization: Fast Binarizing Dynamics for Differentiable MWIS

arXiv:2605.05330v1 Announce Type: new Abstract: We introduce Graph Normalization (GN), a principled dynamical system on graphs that serves as a differentiable approximation engine for the NP-hard Maximum Weight Independent Set (MWIS) problem. MWIS encompasses many combinatorial challenges, including optimal assignment,…

May 8, 2026

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

arXiv:2605.05892v1 Announce Type: cross Abstract: Activation steering has emerged as a promising alternative for controlling language-model behavior at inference time by modifying intermediate representations while keeping model parameters frozen. However, large-scale evaluations such as AxBench show that existing steering methods…

May 8, 2026

Feature Starvation as Geometric Instability in Sparse Autoencoders

arXiv:2605.05341v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) are used to disentangle the dense, polysemantic internal representations of large language models (LLMs) into interpretable, monosemantic concepts. However, standard $ell_1$-regularized SAEs suffer from feature starvation (dead neurons) and shrinkage bias, often…

May 8, 2026