Archives AI News

AutoNumerics-Zero: Automated Discovery of State-of-the-Art Mathematical Functions

arXiv:2312.08472v2 Announce Type: replace-cross Abstract: Transcendental functions, such as the exponential, are central to scientific computing, yet they cannot be natively calculated by digital hardware. Instead, computers must approximate these functions by combining basic operations, such as ${+, -, times,…

A Goal-Set Characterization of Task Composition in the Boolean Task Algebra

arXiv:2606.04053v1 Announce Type: new Abstract: The Boolean Task Algebra (BTA) provides a principled framework for zero-shot task composition in reinforcement learning by equipping goal-reaching tasks with Boolean operations. We revisit its structural assumptions and formalize a collapse in the space…

Path-conditioned training: a principled way to rescale ReLU neural networks

arXiv:2602.19799v2 Announce Type: replace-cross Abstract: Despite recent algorithmic advances, we still lack principled ways to leverage the well-documented rescaling symmetries in ReLU neural network parameters. While two properly rescaled weights implement the same function, the training dynamics can be dramatically…

Spectral Scaling Laws of Muon

arXiv:2606.04058v1 Announce Type: new Abstract: Orthonormalized update rules have rapidly become a leading choice of optimizer for training large language models, with recent open-source state-of-the-art models adopting Muon. To keep these updates tractable, Muon performs the orthonormalization with the Newton–Schulz…

SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference

arXiv:2606.04511v1 Announce Type: cross Abstract: Sparse attention reduces compute and memory bandwidth for long-context LLM inference. However, two key challenges remain: (1) KV cache capacity still grows with sequence length, and offloading to CPU memory introduces a PCIe transfer bottleneck;…

LLM Compression with Jointly Optimizing Architectural and Quantization choices

arXiv:2606.04063v1 Announce Type: new Abstract: Deploying large language models (LLMs) is challenging due to their significant memory and computational requirements. While some methods address this by developing small or tiny language models from scratch, these approaches demand extensive GPU training.…

TPA-AD: A Two-Stage Pseudo Anomaly-Guided Method for Bearing Time-Series Anomaly Detection

arXiv:2606.04073v1 Announce Type: new Abstract: This paper proposes a two-stage pseudo anomaly-guided anomaly detection method (textbf{T}wo-stage textbf{P}seudo textbf{A}nomaly-guided textbf{A}nomaly textbf{D}etection, textbf{TPA-AD}) for axle-box bearing time-series anomaly detection (time series anomaly detection, TSAD) under the setting where only normal samples are…

Adaptive Patching Is Harder Than It Looks For Time-Series Forecasting

arXiv:2606.04074v1 Announce Type: new Abstract: Adaptive patching is a recent and compelling proposal for time-series Transformers: allocate finer patches where the sequence looks locally informative. This paper asks under what conditions a content-adaptive patching operator should outperform a tuned uniform…