Archives AI News

Expressivity-Efficiency Tradeoffs for Hybrid Sequence Models

arXiv:2603.08859v1 Announce Type: new Abstract: Hybrid sequence models–combining Transformer and state-space model layers–seek to gain the expressive versatility of attention as well as the computational efficiency of state-space model layers. Despite burgeoning interest in hybrid models, we lack a basic…

The Temporal Markov Transition Field

arXiv:2603.08803v1 Announce Type: new Abstract: The Markov Transition Field (MTF), introduced by Wang and Oates (2015), encodes a time series as a two-dimensional image by mapping each pair of time steps to the transition probability between their quantile states, estimated…

Multi-level meta-reinforcement learning with skill-based curriculum

arXiv:2603.08773v1 Announce Type: new Abstract: We consider problems in sequential decision making with natural multi-level structure, where sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure has remained a longstanding challenge; we describe an efficient…

SPREAD: Subspace Representation Distillation for Lifelong Imitation Learning

arXiv:2603.08763v1 Announce Type: new Abstract: A key challenge in lifelong imitation learning (LIL) is enabling agents to acquire new skills from expert demonstrations while retaining prior knowledge. This requires preserving the low-dimensional manifolds and geometric structures that underlie task representations…

Generalized Reduction to the Isotropy for Flexible Equivariant Neural Fields

arXiv:2603.08758v1 Announce Type: new Abstract: Many geometric learning problems require invariants on heterogeneous product spaces, i.e., products of distinct spaces carrying different group actions, where standard techniques do not directly apply. We show that, when a group $G$ acts transitively…

Scalable Training of Mixture-of-Experts Models with Megatron Core

arXiv:2603.07685v2 Announce Type: replace-cross Abstract: Scaling Mixture-of-Experts (MoE) training introduces systems challenges absent in dense models. Because each token activates only a subset of experts, this sparsity allows total parameters to grow much faster than per-token computation, creating coupled constraints…