Archives AI News

Theoretically Optimal Attention/FFN Ratios in Disaggregated LLM Serving

arXiv:2601.21351v2 Announce Type: replace Abstract: Attentio-FFN disaggregation (AFD) is an emerging architecture for LLM decoding that separates state-heavy, KV-cache-dominated Attention computation from stateless, compute-intensive FFN computation, connected by per-step communication. While AFD enables independent scaling of memory and compute resources,…

Expert Routing for Communication-Efficient MoE via Finite Expert Banks

arXiv:2605.05278v1 Announce Type: new Abstract: Resource-efficient machine learning increasingly uses sparse Mixture-of-Experts (MoE) architectures, where the gate acts as both a learning component and a routing interface controlling computation, communication, and accuracy. Motivated by finite-rate interpretations of MoE gating, we…

Pretrained Event Classification Model for High Energy Physics Analysis

arXiv:2412.10665v2 Announce Type: replace-cross Abstract: We introduce a foundation model for event classification in high-energy physics, built on a Graph Neural Network architecture and trained on 120 million simulated proton-proton collision events spanning 12 distinct physics processes. The model is…

Dense Neural Networks are not Universal Approximators

arXiv:2602.07618v5 Announce Type: replace Abstract: We investigate the approximation capabilities of dense neural networks. While universal approximation theorems establish that sufficiently large architectures can approximate arbitrary continuous functions if there are no restrictions on the weight values, we show that…