Archives AI News

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

arXiv:2603.10085v1 Announce Type: new Abstract: Improving GPU kernel efficiency is crucial for advancing AI systems. Recent work has explored leveraging large language models (LLMs) for GPU kernel generation and optimization. However, existing LLM-based kernel optimization pipelines typically rely on opaque,…

March 12, 2026

Score Matching Diffusion Based Feedback Control and Planning of Nonlinear Systems

arXiv:2504.09836v2 Announce Type: replace-cross Abstract: In this paper, we propose a deterministic diffusion-based framework for controlling the probability density of nonlinear control-affine systems, with theoretical guarantees for drift-free and linear time-invariant (LTI) dynamics. The central idea is to first excite…

March 12, 2026

ES-dLLM: Efficient Inference for Diffusion Large Language Models by Early-Skipping

arXiv:2603.10088v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) are emerging as a promising alternative to autoregressive models (ARMs) due to their ability to capture bidirectional context and the potential for parallel generation. Despite the advantages, dLLM inference remains…

March 12, 2026

KV Cache Transform Coding for Compact Storage in LLM Inference

arXiv:2511.01815v2 Announce Type: replace-cross Abstract: Serving large language models (LLMs) at scale necessitates efficient key-value (KV) cache management. KV caches can be reused across conversation turns via shared-prefix prompts that are common in iterative code editing and chat. However, stale…

March 12, 2026

A Survey of Weight Space Learning: Understanding, Representation, and Generation

arXiv:2603.10090v1 Announce Type: new Abstract: Neural network weights are typically viewed as the end product of training, while most deep learning research focuses on data, features, and architectures. However, recent advances show that the set of all possible weight values…

March 12, 2026

GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture

arXiv:2602.14771v2 Announce Type: replace-cross Abstract: The human visual system tracks objects by integrating current observations with previously observed information, adapting to target and scene changes, and reasoning about occlusion at fine granularity. In contrast, recent generic object trackers are often…

March 12, 2026

Equivariant Asynchronous Diffusion: An Adaptive Denoising Schedule for Accelerated Molecular Conformation Generation

arXiv:2603.10093v1 Announce Type: new Abstract: Recent 3D molecular generation methods primarily use asynchronous auto-regressive or synchronous diffusion models. While auto-regressive models build molecules sequentially, they’re limited by a short horizon and a discrepancy between training and inference. Conversely, synchronous diffusion…

March 12, 2026

Quantum entanglement provides a competitive advantage in adversarial games

arXiv:2603.10289v1 Announce Type: cross Abstract: Whether uniquely quantum resources confer advantages in fully classical, competitive environments remains an open question. Competitive zero-sum reinforcement learning is particularly challenging, as success requires modelling dynamic interactions between opposing agents rather than static state-action…

March 12, 2026

Rethinking Adam for Time Series Forecasting: A Simple Heuristic to Improve Optimization under Distribution Shifts

arXiv:2603.10095v1 Announce Type: new Abstract: Time-series forecasting often faces challenges from non-stationarity, particularly distributional drift, where the data distribution evolves over time. This dynamic behavior can undermine the effectiveness of adaptive optimizers, such as Adam, which are typically designed for…

March 12, 2026

Dual Space Preconditioning for Gradient Descent in the Overparameterized Regime

arXiv:2603.10485v1 Announce Type: cross Abstract: In this work we study the convergence properties of the Dual Space Preconditioned Gradient Descent, encompassing optimizers such as Normalized Gradient Descent, Gradient Clipping and Adam. We consider preconditioners of the form $nabla K$, where…

March 12, 2026