Archives AI News

Expert Divergence Learning for MoE-based Language Models

arXiv:2603.00054v1 Announce Type: new Abstract: The Mixture-of-Experts (MoE) architecture is a powerful technique for scaling language models, yet it often suffers from expert homogenization, where experts learn redundant functionalities, thereby limiting MoE’s full potential. To address this, we introduce Expert…

Certainty-Validity: A Diagnostic Framework for Discrete Commitment Systems

arXiv:2603.00070v1 Announce Type: new Abstract: Standard evaluation metrics for machine learning — accuracy, precision, recall, and AUROC — assume that all errors are equivalent: a confident incorrect prediction is penalized identically to an uncertain one. For discrete commitment systems (architectures…

Value Flows

arXiv:2510.07650v2 Announce Type: replace Abstract: While most reinforcement learning methods today flatten the distribution of future returns to a single scalar value, distributional RL methods exploit the return distribution to provide stronger learning signals and to enable applications in exploration…

SEval-NAS: A Search-Agnostic Evaluation for Neural Architecture Search

arXiv:2603.00099v1 Announce Type: new Abstract: Neural architecture search (NAS) automates the discovery of neural networks that meet specified criteria, yet its evaluation procedures are often hardcoded, limiting the ability to introduce new metrics. This issue is especially pronounced in hardware-aware…

Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs

arXiv:2512.03324v2 Announce Type: replace Abstract: Memory and computation remain core bottlenecks in long-horizon LLM inference due to the quadratic cost of self-attention and the ever-growing key-value (KV) cache. Existing strategies for memory-bounded inference, such as quantization, offloading, or heuristic KV…

Wideband Power Amplifier Behavioral Modeling Using an Amplitude Conditioned LSTM

arXiv:2603.00101v1 Announce Type: new Abstract: Wideband power amplifiers exhibit complex nonlinear and memory effects that challenge traditional behavioral modeling approaches. This paper proposes a novel amplitude conditioned long short-term memory (AC-LSTM) network that introduces explicit amplitude-dependent gating to enhance the…