Archives AI News

Regret Bounds for Reinforcement Learning from Multi-Source Imperfect Preferences

arXiv:2603.20453v2 Announce Type: replace Abstract: Reinforcement learning from human feedback (RLHF) replaces hard-to-specify rewards with pairwise trajectory preferences, yet regret-oriented theory often assumes that preference labels are generated consistently from a single ground-truth objective. In practical RLHF systems, however, feedback…

DiffGradCAM: A Universal Class Activation Map Resistant to Adversarial Training

arXiv:2506.08514v3 Announce Type: replace Abstract: Class Activation Mapping (CAM) and its gradient-based variants (e.g., GradCAM) have become standard tools for explaining Convolutional Neural Network (CNN) predictions. However, these approaches typically focus on individual logits, while for neural networks using softmax,…

Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial

arXiv:2604.01328v1 Announce Type: new Abstract: Traditional scientific discovery relies on an iterative hypothesise-experiment-refine cycle that has driven progress for centuries, but its intuitive, ad-hoc implementation often wastes resources, yields inefficient designs, and misses critical insights. This tutorial presents Bayesian Optimisation…

Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

arXiv:2208.02389v3 Announce Type: replace Abstract: Motivated by practical considerations in machine learning for financial decision-making, such as risk aversion and large action space, we consider risk-aware bandits optimization with applications in smart order routing (SOR). Specifically, based on preliminary observations…

Forecasting Supply Chain Disruptions with Foresight Learning

arXiv:2604.01298v1 Announce Type: new Abstract: Anticipating supply chain disruptions before they materialize is a core challenge for firms and policymakers alike. A key difficulty is learning to reason reliably about infrequent, high-impact events from noisy and unstructured inputs – a…