Archives AI News

KANMixer: a minimal KAN-centered mixer for long-term time series forecasting

arXiv:2508.01575v2 Announce Type: replace Abstract: Long-term time series forecasting (LTSF) underpins critical applications from energy management to weather prediction, yet achieving reliable multi-step-ahead accuracy remains challenging. Existing LTSF approaches, dominated by MLP- and Transformer-based architectures, either rely on simple linear…

April 23, 2026

Eventually LIL Regret: Almost Sure $lnln T$ Regret for a sub-Gaussian Mixture on Unbounded Data

arXiv:2512.12325v3 Announce Type: replace Abstract: We prove that a classic sub-Gaussian mixture proposed by Robbins in a stochastic setting actually satisfies a path-wise (deterministic) regret bound. For every path in a natural “Ville event” $mathcal E_alpha$, this regret till time…

April 23, 2026

An explicit operator explains end-to-end computation in the modern neural networks used for sequence and language modeling

arXiv:2604.20595v1 Announce Type: cross Abstract: We establish a mathematical correspondence between state space models, a state-of-the-art architecture for capturing long-range dependencies in data, and an exactly solvable nonlinear oscillator network. As a specific example of this general correspondence, we analyze…

April 23, 2026

Fairness-Aware Multi-Group Target Detection in Online Discussion

arXiv:2407.11933v4 Announce Type: replace Abstract: Target-group detection is the task of detecting which group(s) a piece of content is “directed at or about”. Applications include targeted marketing, content recommendation, and group-specific content assessment. Key challenges include: 1) that a single…

April 23, 2026

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

arXiv:2604.19859v1 Announce Type: new Abstract: Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep…

April 23, 2026

Super Apriel: One Checkpoint, Many Speeds

arXiv:2604.19877v1 Announce Type: new Abstract: We release Super Apriel, a 15B-parameter supernet in which every decoder layer provides four trained mixer choices — Full Attention (FA), Sliding Window Attention (SWA), Kimi Delta Attention (KDA), and Gated DeltaNet (GDN). A placement…

April 23, 2026

Graph-Theoretic Models for the Prediction of Molecular Measurements

arXiv:2604.19840v1 Announce Type: new Abstract: Graph-theoretic approaches offer simplicity, interpretability, and low computational cost for molecular property prediction. Among these, the model proposed by Mukwembi and Nyabadza, based on the external activity $D(G)$ and internal activity $zeta(G)$ indices, achieved strong…

April 23, 2026

Rethinking Reinforcement Fine-Tuning in LVLM: Convergence, Reward Decomposition, and Generalization

arXiv:2604.19857v1 Announce Type: new Abstract: Reinforcement fine-tuning with verifiable rewards (RLVR) has emerged as a powerful paradigm for equipping large vision-language models (LVLMs) with agentic capabilities such as tool use and multi-step reasoning. Despite striking empirical successes, most notably Visual…

April 23, 2026

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts

arXiv:2604.19835v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) has become the dominant architecture for scaling large language models: frontier models routinely decouple total parameters from per-token computation through sparse expert routing. Scaling laws show that under fixed active computation, model quality…

April 23, 2026

On-Meter Graph Machine Learning: A Case Study of PV Power Forecasting for Grid Edge Intelligence

arXiv:2604.19800v1 Announce Type: new Abstract: This paper presents a detailed study of how graph neural networks can be used on edge intelligent meters in a microgrid to forecast photovoltaic power generation. The problem background and the adopted technologies are introduced,…

April 23, 2026