Archives AI News

Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning

arXiv:2505.16270v2 Announce Type: replace-cross Abstract: Large language models are typically adapted to downstream tasks through supervised fine-tuning on domain-specific data. While standard fine-tuning focuses on minimizing generation loss to optimize model parameters, we take a deeper step by retaining and…

November 17, 2025

Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning

arXiv:2511.10843v1 Announce Type: new Abstract: Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we leverage new results…

November 17, 2025

Augmented data and neural networks for robust epidemic forecasting: application to COVID-19 in Italy

arXiv:2510.09192v2 Announce Type: replace-cross Abstract: In this work, we propose a data augmentation strategy aimed at improving the training phase of neural networks and, consequently, the accuracy of their predictions. Our approach relies on generating synthetic data through a suitable…

November 17, 2025

STAMP: Spatial-Temporal Adapter with Multi-Head Pooling

arXiv:2511.10848v1 Announce Type: new Abstract: Time series foundation models (TSFMs) pretrained on data from multiple domains have shown strong performance on diverse modeling tasks. Various efforts have been made to develop foundation models specific to electroencephalography (EEG) data, which records…

November 17, 2025

On the Relationship Between Adversarial Robustness and Decision Region in Deep Neural Networks

arXiv:2207.03400v3 Announce Type: replace Abstract: In general, Deep Neural Networks (DNNs) are evaluated by the generalization performance measured on unseen data excluded from the training phase. Along with the development of DNNs, the generalization performance converges to the state-of-the-art and…

November 17, 2025

ExPairT-LLM: Exact Learning for LLM Code Selection by Pairwise Queries

arXiv:2511.10855v1 Announce Type: new Abstract: Despite recent advances in LLMs, the task of code generation is still challenging. To cope, code selection algorithms select the best program from multiple programs generated by an LLM. However, existing algorithms can fail to…

November 17, 2025

Provable Domain Adaptation for Offline Reinforcement Learning with Limited Samples

arXiv:2408.12136v4 Announce Type: replace Abstract: Offline reinforcement learning (RL) learns effective policies from a static target dataset. The performance of state-of-the-art offline RL algorithms notwithstanding, it relies on the size of the target dataset, and it degrades if limited samples…

November 17, 2025

Private Zeroth-Order Optimization with Public Data

arXiv:2511.10859v1 Announce Type: new Abstract: One of the major bottlenecks for deploying popular first-order differentially private (DP) machine learning algorithms (e.g., DP-SGD) lies in their high computation and memory cost, despite the existence of optimized implementations. Zeroth-order methods have promise…

November 17, 2025

Bi-Level Contextual Bandits for Individualized Resource Allocation under Delayed Feedback

arXiv:2511.10572v2 Announce Type: replace-cross Abstract: Equitably allocating limited resources in high-stakes domains-such as education, employment, and healthcare-requires balancing short-term utility with long-term impact, while accounting for delayed outcomes, hidden heterogeneity, and ethical constraints. However, most learning-based allocation frameworks either assume…

November 17, 2025

The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent

arXiv:2502.13961v4 Announce Type: replace-cross Abstract: Understanding the advantages of deep neural networks trained by gradient descent (GD) compared to shallow models remains an open theoretical challenge. In this paper, we introduce a class of target functions (single and multi-index Gaussian…

November 17, 2025