Archives AI News

DCV-ROOD Evaluation Framework: Dual Cross-Validation for Robust Out-of-Distribution Detection

arXiv:2509.05778v1 Announce Type: cross Abstract: Out-of-distribution (OOD) detection plays a key role in enhancing the robustness of artificial intelligence systems by identifying inputs that differ significantly from the training distribution, thereby preventing unreliable predictions and enabling appropriate fallback mechanisms. Developing reliable OOD detection methods is a significant challenge, and rigorous evaluation of these techniques is essential for ensuring their effectiveness, as it allows researchers to assess their performance under diverse conditions and to identify potential limitations or failure modes. Cross-validation (CV) has proven to be a highly effective tool for providing a reasonable estimate of the performance of a learning algorithm. Although OOD scenarios exhibit particular characteristics, an appropriate adaptation of CV can lead to a suitable evaluation framework for this setting. This work proposes a dual CV framework for robust evaluation of OOD detection models, aimed at improving the reliability of their assessment. The proposed evaluation framework aims to effectively integrate in-distribution (ID) and OOD data while accounting for their differing characteristics. To achieve this, ID data are partitioned using a conventional approach, whereas OOD data are divided by grouping samples based on their classes. Furthermore, we analyze the context of data with class hierarchy to propose a data splitting that considers the entire class hierarchy to obtain fair ID-OOD partitions to apply the proposed evaluation framework. This framework is called Dual Cross-Validation for Robust Out-of-Distribution Detection (DCV-ROOD). To test the validity of the evaluation framework, we selected a set of state-of-the-art OOD detection methods, both with and without outlier exposure. The results show that the method achieves very fast convergence to the true performance.

September 9, 2025

Sequential Controlled Langevin Diffusions

arXiv:2412.07081v2 Announce Type: replace Abstract: An effective approach for sampling from unnormalized densities is based on the idea of gradually transporting samples from an easy prior to the complicated target distribution. Two popular methods are (1) Sequential Monte Carlo (SMC), where the transport is performed through successive annealed densities via prescribed Markov chains and resampling steps, and (2) recently developed diffusion-based sampling methods, where a learned dynamical transport is used. Despite the common goal, both approaches have different, often complementary, advantages and drawbacks. The resampling steps in SMC allow focusing on promising regions of the space, often leading to robust performance. While the algorithm enjoys asymptotic guarantees, the lack of flexible, learnable transitions can lead to slow convergence. On the other hand, diffusion-based samplers are learned and can potentially better adapt themselves to the target at hand, yet often suffer from training instabilities. In this work, we present a principled framework for combining SMC with diffusion-based samplers by viewing both methods in continuous time and considering measures on path space. This culminates in the new Sequential Controlled Langevin Diffusion (SCLD) sampling method, which is able to utilize the benefits of both methods and reaches improved performance on multiple benchmark problems, in many cases using only 10% of the training budget of previous diffusion-based samplers.

September 9, 2025

Beyond ATE: Multi-Criteria Design for A/B Testing

arXiv:2509.05864v1 Announce Type: cross Abstract: A/B testing is a widely adopted methodology for estimating conditional average treatment effects (CATEs) in both clinical trials and online platforms. While most existing research has focused primarily on maximizing estimation accuracy, practical applications must also account for additional objectives-most notably welfare or revenue loss. In many settings, it is critical to administer treatments that improve patient outcomes or to implement plans that generate greater revenue from customers. Within a machine learning framework, such objectives are naturally captured through the notion of cumulative regret. In this paper, we investigate the fundamental trade-off between social welfare loss and statistical accuracy in (adaptive) experiments with heterogeneous treatment effects. We establish matching upper and lower bounds for the resulting multi-objective optimization problem and employ the concept of Pareto optimality to characterize the necessary and sufficient conditions for optimal experimental designs. Beyond estimating CATEs, practitioners often aim to deploy treatment policies that maximize welfare across the entire population. We demonstrate that our Pareto-optimal adaptive design achieves optimal post-experiment welfare, irrespective of the in-experiment trade-off between accuracy and welfare. Furthermore, since clinical and commercial data are often highly sensitive, it is essential to incorporate robust privacy guarantees into any treatment-allocation mechanism. To this end, we develop differentially private algorithms that continue to achieve our established lower bounds, showing that privacy can be attained at negligible cost.

September 9, 2025

Uncertainty quantification for Markov chain induced martingales with application to temporal difference learning

arXiv:2502.13822v2 Announce Type: replace Abstract: We establish novel and general high-dimensional concentration inequalities and Berry-Esseen bounds for vector-valued martingales induced by Markov chains. We apply these results to analyze the performance of the Temporal Difference (TD) learning algorithm with linear function approximations, a widely used method for policy evaluation in Reinforcement Learning (RL), obtaining a sharp high-probability consistency guarantee that matches the asymptotic variance up to logarithmic factors. Furthermore, we establish an $O(T^{-frac{1}{4}}log T)$ distributional convergence rate for the Gaussian approximation of the TD estimator, measured in convex distance. Our martingale bounds are of broad applicability, and our analysis of TD learning provides new insights into statistical inference for RL algorithms, bridging gaps between classical stochastic approximation theory and modern RL applications.

September 9, 2025

The Measure of Deception: An Analysis of Data Forging in Machine Unlearning

arXiv:2509.05865v1 Announce Type: cross Abstract: Motivated by privacy regulations and the need to mitigate the effects of harmful data, machine unlearning seeks to modify trained models so that they effectively ``forget'' designated data. A key challenge in verifying unlearning is forging -- adversarially crafting data that mimics the gradient of a target point, thereby creating the appearance of unlearning without actually removing information. To capture this phenomenon, we consider the collection of data points whose gradients approximate a target gradient within tolerance $epsilon$ -- which we call an $epsilon$-forging set -- and develop a framework for its analysis. For linear regression and one-layer neural networks, we show that the Lebesgue measure of this set is small. It scales on the order of $epsilon$, and when $epsilon$ is small enough, $epsilon^d$. More generally, under mild regularity assumptions, we prove that the forging set measure decays as $epsilon^{(d-r)/2}$, where $d$ is the data dimension and $r<d$ is the nullity of a variation matrix defined by the model gradients. Extensions to batch SGD and almost-everywhere smooth loss functions yield the same asymptotic scaling. In addition, we establish probability bounds showing that, under non-degenerate data distributions, the likelihood of randomly sampling a forging point is vanishingly small. These results provide evidence that adversarial forging is fundamentally limited and that false unlearning claims can, in principle, be detected.

September 9, 2025

Predicting Market Troughs: A Machine Learning Approach with Causal Interpretation

arXiv:2509.05922v1 Announce Type: cross Abstract: This paper provides robust, new evidence on the causal drivers of market troughs. We demonstrate that conclusions about these triggers are critically sensitive to model specification, moving beyond restrictive linear models with a flexible DML average partial effect causal machine learning framework. Our robust estimates identify the volatility of options-implied risk appetite and market liquidity as key causal drivers, relationships misrepresented or obscured by simpler models. These findings provide high-frequency empirical support for intermediary asset pricing theories. This causal analysis is enabled by a high-performance nowcasting model that accurately identifies capitulation events in real-time.

September 9, 2025

Smoothed Online Optimization for Target Tracking: Robust and Learning-Augmented Algorithms

arXiv:2509.05930v1 Announce Type: cross Abstract: We introduce the Smoothed Online Optimization for Target Tracking (SOOTT) problem, a new framework that integrates three key objectives in online decision-making under uncertainty: (1) tracking cost for following a dynamically moving target, (2) adversarial perturbation cost for withstanding unpredictable disturbances, and (3) switching cost for penalizing abrupt changes in decisions. This formulation captures real-world scenarios such as elastic and inelastic workload scheduling in AI clusters, where operators must balance long-term service-level agreements (e.g., LLM training) against sudden demand spikes (e.g., real-time inference). We first present BEST, a robust algorithm with provable competitive guarantees for SOOTT. To enhance practical performance, we introduce CoRT, a learning-augmented variant that incorporates untrusted black-box predictions (e.g., from ML models) into its decision process. Our theoretical analysis shows that CoRT strictly improves over BEST when predictions are accurate, while maintaining robustness under arbitrary prediction errors. We validate our approach through a case study on workload scheduling, demonstrating that both algorithms effectively balance trajectory tracking, decision smoothness, and resilience to external disturbances.

September 9, 2025

Sequential Gibbs Posteriors with Applications to Principal Component Analysis

arXiv:2310.12882v2 Announce Type: replace-cross Abstract: Gibbs posteriors are proportional to a prior distribution multiplied by an exponentiated loss function, with a key tuning parameter weighting information in the loss relative to the prior and providing a control of posterior uncertainty. Gibbs posteriors provide a principled framework for likelihood-free Bayesian inference, but in many situations, including a single tuning parameter inevitably leads to poor uncertainty quantification. In particular, regardless of the value of the parameter, credible regions have far from the nominal frequentist coverage even in large samples. We propose a sequential extension to Gibbs posteriors to address this problem. We prove the proposed sequential posterior exhibits concentration and a Bernstein-von Mises theorem, which holds under easy to verify conditions in Euclidean space and on manifolds. As a byproduct, we obtain the first Bernstein-von Mises theorem for traditional likelihood-based Bayesian posteriors on manifolds. All methods are illustrated with an application to principal component analysis.

September 9, 2025

If generative AI is the answer, what is the question?

arXiv:2509.06120v1 Announce Type: cross Abstract: Beginning with text and images, generative AI has expanded to audio, video, computer code, and molecules. Yet, if generative AI is the answer, what is the question? We explore the foundations of generation as a distinct machine learning task with connections to prediction, compression, and decision-making. We survey five major generative model families: autoregressive models, variational autoencoders, normalizing flows, generative adversarial networks, and diffusion models. We then introduce a probabilistic framework that emphasizes the distinction between density estimation and generation. We review a game-theoretic framework with a two-player adversary-learner setup to study generation. We discuss post-training modifications that prepare generative models for deployment. We end by highlighting some important topics in socially responsible generation such as privacy, detection of AI-generated content, and copyright and IP. We adopt a task-first framing of generation, focusing on what generation is as a machine learning problem, rather than only on how models implement it.

September 9, 2025

Data-Efficient Time-Dependent PDE Surrogates: Graph Neural Simulators vs Neural Operators

arXiv:2509.06154v1 Announce Type: cross Abstract: Neural operators (NOs) approximate mappings between infinite-dimensional function spaces but require large datasets and struggle with scarce training data. Many NO formulations don't explicitly encode causal, local-in-time structure of physical evolution. While autoregressive models preserve causality by predicting next time-steps, they suffer from rapid error accumulation. We employ Graph Neural Simulators (GNS) - a message-passing graph neural network framework - with explicit numerical time-stepping schemes to construct accurate forward models that learn PDE solutions by modeling instantaneous time derivatives. We evaluate our framework on three canonical PDE systems: (1) 2D Burgers' scalar equation, (2) 2D coupled Burgers' vector equation, and (3) 2D Allen-Cahn equation. Rigorous evaluations demonstrate GNS significantly improves data efficiency, achieving higher generalization accuracy with substantially fewer training trajectories compared to neural operator baselines like DeepONet and FNO. GNS consistently achieves under 1% relative L2 errors with only 30 training samples out of 1000 (3% of available data) across all three PDE systems. It substantially reduces error accumulation over extended temporal horizons: averaged across all cases, GNS reduces autoregressive error by 82.48% relative to FNO AR and 99.86% relative to DON AR. We introduce a PCA+KMeans trajectory selection strategy enhancing low-data performance. Results indicate combining graph-based local inductive biases with conventional time integrators yields accurate, physically consistent, and scalable surrogate models for time-dependent PDEs.

September 9, 2025