Archives AI News

Improved Bounds for Private and Robust Alignment

arXiv:2512.23816v1 Announce Type: new Abstract: In this paper, we study the private and robust alignment of language models from a theoretical perspective by establishing upper bounds on the suboptimality gap in both offline and online settings. We consider preference labels…

January 1, 2026

DiRe: Diversity-promoting Regularization for Dataset Condensation

arXiv:2512.13083v2 Announce Type: replace-cross Abstract: In Dataset Condensation, the goal is to synthesize a small dataset that replicates the training utility of a large original dataset. Existing condensation methods synthesize datasets with significant redundancy, so there is a dire need…

January 1, 2026

MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling

arXiv:2512.23824v1 Announce Type: new Abstract: State-space models (SSMs) have recently attention as an efficient alternative to computationally expensive attention-based models for sequence modeling. They rely on linear recurrences to integrate information over time, enabling fast inference, parallelizable training, and control…

January 1, 2026

A New Decomposition Paradigm for Graph-structured Nonlinear Programs via Message Passing

arXiv:2512.24676v1 Announce Type: cross Abstract: We study finite-sum nonlinear programs whose decision variables interact locally according to a graph or hypergraph. We propose MP-Jacobi (Message Passing-Jacobi), a graph-compliant decentralized framework that couples min-sum message passing with Jacobi block updates. The…

January 1, 2026

Exploiting the Prior of Generative Time Series Imputation

arXiv:2512.23832v1 Announce Type: new Abstract: Time series imputation, i.e., filling the missing values of a time recording, finds various applications in electricity, finance, and weather modelling. Previous methods have introduced generative models such as diffusion probabilistic models and Schrodinger bridge…

January 1, 2026

mHC: Manifold-Constrained Hyper-Connections

arXiv:2512.24880v1 Announce Type: cross Abstract: Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification…

January 1, 2026

Trellis: Learning to Compress Key-Value Memory in Attention Models

arXiv:2512.23852v1 Announce Type: new Abstract: Transformers, while powerful, suffer from quadratic computational complexity and the ever-growing Key-Value (KV) cache of the attention mechanism. This paper introduces Trellis, a novel Transformer architecture with bounded memory that learns how to compress its…

January 1, 2026

Basic Inequalities for First-Order Optimization with Applications to Statistical Risk Analysis

arXiv:2512.24999v1 Announce Type: cross Abstract: We introduce textit{basic inequalities} for first-order iterative optimization algorithms, forming a simple and versatile framework that connects implicit and explicit regularization. While related inequalities appear in the literature, we isolate and highlight a specific form…

January 1, 2026

Flow Matching Neural Processes

arXiv:2512.23853v1 Announce Type: new Abstract: Neural processes (NPs) are a class of models that learn stochastic processes directly from data and can be used for inference, sampling and conditional sampling. We introduce a new NP model based on flow matching,…

January 1, 2026

Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling

arXiv:2402.18508v3 Announce Type: replace Abstract: In the rapidly evolving field of deep learning, the demand for models that are both expressive and computationally efficient has never been more critical. This paper introduces Orchid, a novel architecture designed to address the…

January 1, 2026