Archives AI News

Zero-Order Optimization for LLM Fine-Tuning via Learnable Direction Sampling

arXiv:2602.13659v1 Announce Type: new Abstract: Fine-tuning large pretrained language models (LLMs) is a cornerstone of modern NLP, yet its growing memory demands (driven by backpropagation and large optimizer States) limit deployment in resource-constrained settings. Zero-order (ZO) methods bypass backpropagation by…

Evolution Strategies at the Hyperscale

arXiv:2511.16652v2 Announce Type: replace Abstract: Evolution Strategies (ES) is a class of powerful black-box optimisation methods that are highly parallelisable and can handle non-differentiable and noisy objectives. However, na”ive ES becomes prohibitively expensive at scale on GPUs due to the…

Optimized Certainty Equivalent Risk-Controlling Prediction Sets

arXiv:2602.13660v1 Announce Type: new Abstract: In safety-critical applications such as medical image segmentation, prediction systems must provide reliability guarantees that extend beyond conventional expected loss control. While risk-controlling prediction sets (RCPS) offer probabilistic guarantees on the expected risk, they fail…

Efficient Tensor Completion Algorithms for Highly Oscillatory Operators

arXiv:2510.17734v3 Announce Type: replace-cross Abstract: This paper presents low-complexity tensor completion algorithms and their efficient implementation to reconstruct highly oscillatory operators discretized as $ntimes n$ matrices. The underlying tensor decomposition is based on the reshaping of the input matrix and…

Deep Two-Way Matrix Reordering for Relational Data Analysis

arXiv:2103.14203v5 Announce Type: replace-cross Abstract: Matrix reordering is a task to permute the rows and columns of a given observed matrix such that the resulting reordered matrix shows meaningful or interpretable structural patterns. Most existing matrix reordering techniques share the…

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

arXiv:2509.16117v2 Announce Type: replace Abstract: Online reinforcement learning (RL) has been central to post-training language models, but its extension to diffusion models remains challenging due to intractable likelihoods. Recent works discretize the reverse sampling process to enable GRPO-style training, yet…

Critic-Guided Reinforcement Unlearning in Text-to-Image Diffusion

arXiv:2601.03213v3 Announce Type: replace Abstract: Machine unlearning in text-to-image diffusion models aims to remove targeted concepts while preserving overall utility. Prior diffusion unlearning methods typically rely on supervised weight edits or global penalties; reinforcement-learning (RL) approaches, while flexible, often optimize…

Robust Multi-Objective Controlled Decoding of Large Language Models

arXiv:2503.08796v2 Announce Type: replace Abstract: We introduce Robust Multi-Objective Decoding (RMOD), a novel inference-time algorithm that robustly aligns Large Language Models (LLMs) to multiple human objectives (e.g., instruction-following, helpfulness, safety) by maximizing the worst-case rewards. RMOD formulates the robust decoding…