Archives AI News

Streaming Structured Inference with Flash-SemiCRF

arXiv:2604.18780v1 Announce Type: new Abstract: Semi-Markov Conditional Random Fields (semi-CRFs) assign labels to segments of a sequence rather than to individual positions, enabling exact inference over segment-level features and principled uncertainty estimates at their boundaries. However, existing implementations must materialize…

Time-Scale Coupling Between States and Parameters in Recurrent Neural Networks

arXiv:2508.12121v5 Announce Type: replace Abstract: We show that gating mechanisms in recurrent neural networks (RNNs) induce lag-dependent and direction-dependent effective learning rates, even when training uses a fixed, global step size. This behavior arises from a coupling between state-space time-scales…

Efficient Mixture-of-Experts LLM Inference with Apple Silicon NPUs

arXiv:2604.18788v1 Announce Type: new Abstract: Apple Neural Engine (ANE) is a dedicated neural processing unit (NPU) present in every Apple Silicon chip. Mixture-of-Experts (MoE) LLMs improve inference efficiency via sparse activation but are challenging for NPUs in three ways: expert…

Optimized Architectures for Kolmogorov-Arnold Networks

arXiv:2512.12448v2 Announce Type: replace Abstract: Efforts to improve Kolmogorov–Arnold networks (KANs) with architectural enhancements have been stymied by the complexity those enhancements bring, undermining the interpretability that makes KANs attractive in the first place. Here we study overprovisioned architectures combined…

HELM: Harness-Enhanced Long-horizon Memory for Vision-Language-Action Manipulation

arXiv:2604.18791v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models fail systematically on long-horizon manipulation tasks despite strong short-horizon performance. We show that this failure is not resolved by extending context length alone in the current reactive execution setting; instead, it stems…

GAIN: Multiplicative Modulation for Domain Adaptation

arXiv:2604.04516v2 Announce Type: replace Abstract: Adapting LLMs to new domains causes forgetting because standard methods (e.g., full fine-tuning, LoRA) inject new directions into the weight space. We show that forgetting is governed by one algebraic property: whether the update preserves…

Preserving Clusters in Error-Bounded Lossy Compression of Particle Data

arXiv:2604.18801v1 Announce Type: new Abstract: Lossy compression is widely used to reduce storage and I/O costs for large-scale particle datasets in scientific applications such as cosmology, molecular dynamics, and fluid dynamics, where clustering structures (e.g., single-linkage or Friends-of-Friends) are critical…

On the Generalizability of Foundation Models for Crop Type Mapping

arXiv:2409.09451v5 Announce Type: replace-cross Abstract: Foundation models pre-trained using self-supervised learning have shown powerful transfer learning capabilities on various downstream tasks, including language understanding, text generation, and image recognition. The Earth observation (EO) field has produced several foundation models pre-trained…