Archives AI News

Statistical Test for Saliency Maps of Graph Neural Networks via Selective Inference

arXiv:2505.16893v2 Announce Type: replace Abstract: Graph Neural Networks (GNNs) have gained prominence for their ability to process graph-structured data across various domains. However, interpreting GNN decisions remains a significant challenge, leading to the adoption of saliency maps for identifying salient subgraphs composed of influential nodes and edges. Despite their utility, the reliability of GNN saliency maps has been questioned, particularly in terms of their robustness to input noise. In this study, we propose a statistical testing framework to rigorously evaluate the significance of saliency maps. Our main contribution lies in addressing the inflation of the Type I error rate caused by double-dipping of data, leveraging the framework of Selective Inference. Our method provides statistically valid $p$-values while controlling the Type I error rate, ensuring that identified salient subgraphs contain meaningful information rather than random artifacts. The method is applicable to a variety of saliency methods with piecewise linearity (e.g., Class Activation Mapping). We validate our method on synthetic and real-world datasets, demonstrating its capability in assessing the reliability of GNN interpretations.

Calibration Prediction Interval for Non-parametric Regression and Neural Networks

arXiv:2509.02735v1 Announce Type: cross Abstract: Accurate conditional prediction in the regression setting plays an important role in many real-world problems. Typically, a point prediction often falls short since no attempt is made to quantify the prediction accuracy. Classically, under the normality and linearity assumptions, the Prediction Interval (PI) for the response variable can be determined routinely based on the $t$ distribution. Unfortunately, these two assumptions are rarely met in practice. To fully avoid these two conditions, we develop a so-called calibration PI (cPI) which leverages estimations by Deep Neural Networks (DNN) or kernel methods. Moreover, the cPI can be easily adjusted to capture the estimation variability within the prediction procedure, which is a crucial error source often ignored in practice. Under regular assumptions, we verify that our cPI has an asymptotically valid coverage rate. We also demonstrate that cPI based on the kernel method ensures a coverage rate with a high probability when the sample size is large. Besides, with several conditions, the cPI based on DNN works even with finite samples. A comprehensive simulation study supports the usefulness of cPI, and the convincing performance of cPI with a short sample is confirmed with two empirical datasets.

When a Reinforcement Learning Agent Encounters Unknown Unknowns

arXiv:2505.13188v2 Announce Type: replace-cross Abstract: An AI agent might surprisingly find she has reached an unknown state which she has never been aware of -- an unknown unknown. We mathematically ground this scenario in reinforcement learning: an agent, after taking an action calculated from value functions $Q$ and $V$ defined on the {it {aware domain}}, reaches a state out of the domain. To enable the agent to handle this scenario, we propose an {it episodic Markov decision {process} with growing awareness} (EMDP-GA) model, taking a new {it noninformative value expansion} (NIVE) approach to expand value functions to newly aware areas: when an agent arrives at an unknown unknown, value functions $Q$ and $V$ whereon are initialised by noninformative beliefs -- the averaged values on the aware domain. This design is out of respect for the complete absence of knowledge in the newly discovered state. The upper confidence bound momentum Q-learning is then adapted to the growing awareness for training the EMDP-GA model. We prove that (1) the regret of our approach is asymptotically consistent with the state of the art (SOTA) without exposure to unknown unknowns in an extremely uncertain environment, and (2) our computational complexity and space complexity are comparable with the SOTA -- these collectively suggest that though an unknown unknown is surprising, it will be asymptotically properly discovered with decent speed and an affordable cost.

Non-Linear Counterfactual Aggregate Optimization

arXiv:2509.03438v1 Announce Type: new Abstract: We consider the problem of directly optimizing a non-linear function of an outcome, where this outcome itself is the sum of many small contributions. The non-linearity of the function means that the problem is not equivalent to the maximization of the expectation of the individual contribution. By leveraging the concentration properties of the sum of individual outcomes, we derive a scalable descent algorithm that directly optimizes for our stated objective. This allows for instance to maximize the probability of successful A/B test, for which it can be wiser to target a success criterion, such as exceeding a given uplift, rather than chasing the highest expected payoff.

Off-Policy Learning in Large Action Spaces: Optimization Matters More Than Estimation

arXiv:2509.03456v1 Announce Type: new Abstract: Off-policy evaluation (OPE) and off-policy learning (OPL) are foundational for decision-making in offline contextual bandits. Recent advances in OPL primarily optimize OPE estimators with improved statistical properties, assuming that better estimators inherently yield superior policies. Although theoretically justified, we argue this estimator-centric approach neglects a critical practical obstacle: challenging optimization landscapes. In this paper, we provide theoretical insights and extensive empirical evidence showing that current OPL methods encounter severe optimization issues, particularly as action spaces become large. We demonstrate that simpler weighted log-likelihood objectives enjoy substantially better optimization properties and still recover competitive, often superior, learned policies. Our findings emphasize the necessity of explicitly addressing optimization considerations in the development of OPL algorithms for large action spaces.

Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization

arXiv:2509.03378v1 Announce Type: new Abstract: As an adaptive method, Shampoo employs a structured second-moment estimation, and its effectiveness has attracted growing attention. Prior work has primarily analyzed its estimation scheme through the Frobenius norm. Motivated by the natural connection between the second moment and a covariance matrix, we propose studying Shampoo's estimation as covariance estimation through the lens of Kullback-Leibler (KL) minimization. This alternative perspective reveals a previously hidden limitation, motivating improvements to Shampoo's design. Building on this insight, we develop a practical estimation scheme, termed KL-Shampoo, that eliminates Shampoo's reliance on Adam for stabilization, thereby removing the additional memory overhead introduced by Adam. Preliminary results show that KL-Shampoo improves Shampoo's performance, enabling it to stabilize without Adam and even outperform its Adam-stabilized variant, SOAP, in neural network pretraining.

Bayesian Additive Regression Trees for functional ANOVA model

arXiv:2509.03317v1 Announce Type: new Abstract: Bayesian Additive Regression Trees (BART) is a powerful statistical model that leverages the strengths of Bayesian inference and regression trees. It has received significant attention for capturing complex non-linear relationships and interactions among predictors. However, the accuracy of BART often comes at the cost of interpretability. To address this limitation, we propose ANOVA Bayesian Additive Regression Trees (ANOVA-BART), a novel extension of BART based on the functional ANOVA decomposition, which is used to decompose the variability of a function into different interactions, each representing the contribution of a different set of covariates or factors. Our proposed ANOVA-BART enhances interpretability, preserves and extends the theoretical guarantees of BART, and achieves superior predictive performance. Specifically, we establish that the posterior concentration rate of ANOVA-BART is nearly minimax optimal, and further provides the same convergence rates for each interaction that are not available for BART. Moreover, comprehensive experiments confirm that ANOVA-BART surpasses BART in both accuracy and uncertainty quantification, while also demonstrating its effectiveness in component selection. These results suggest that ANOVA-BART offers a compelling alternative to BART by balancing predictive accuracy, interpretability, and theoretical consistency.

Scale-Adaptive Generative Flows for Multiscale Scientific Data

arXiv:2509.02971v1 Announce Type: new Abstract: Flow-based generative models can face significant challenges when modeling scientific data with multiscale Fourier spectra, often producing large errors in fine-scale features. We address this problem within the framework of stochastic interpolants, via principled design of noise distributions and interpolation schedules. The key insight is that the noise should not be smoother than the target data distribution -- measured by Fourier spectrum decay rates -- to ensure bounded drift fields near the initial time. For Gaussian and near-Gaussian distributions whose fine-scale structure is known, we show that spectrum-matched noise improves numerical efficiency compared to standard white-noise approaches. For complex non-Gaussian distributions, we develop scale-adaptive interpolation schedules that address the numerical ill-conditioning arising from rougher-than-data noise. Numerical experiments on synthetic Gaussian random fields and solutions to the stochastic Allen-Cahn and Navier-Stokes equations validate our approach and demonstrate its ability to generate high-fidelity samples at lower computational cost than traditional approaches.

Inference on covariance structure in high-dimensional multi-view data

arXiv:2509.02772v1 Announce Type: cross Abstract: This article focuses on covariance estimation for multi-view data. Popular approaches rely on factor-analytic decompositions that have shared and view-specific latent factors. Posterior computation is conducted via expensive and brittle Markov chain Monte Carlo (MCMC) sampling or variational approximations that underestimate uncertainty and lack theoretical guarantees. Our proposed methodology employs spectral decompositions to estimate and align latent factors that are active in at least one view. Conditionally on these factors, we choose jointly conjugate prior distributions for factor loadings and residual variances. The resulting posterior is a simple product of normal-inverse gamma distributions for each variable, bypassing MCMC and facilitating posterior computation. We prove favorable increasing-dimension asymptotic properties, including posterior contraction and central limit theorems for point estimators. We show excellent performance in simulations, including accurate uncertainty quantification, and apply the methodology to integrate four high-dimensional views from a multi-omics dataset of cancer cell samples.

A Composite-Loss Graph Neural Network for the Multivariate Post-Processing of Ensemble Weather Forecasts

arXiv:2509.02784v1 Announce Type: cross Abstract: Ensemble forecasting systems have advanced meteorology by providing probabilistic estimates of future states, supporting applications from renewable energy production to transportation safety. Nonetheless, systematic biases often persist, making statistical post-processing essential. Traditional parametric post-processing techniques and machine learning-based methods can produce calibrated predictive distributions at specific locations and lead times, yet often struggle to capture dependencies across forecast dimensions. To address this, multivariate post-processing methods-such as ensemble copula coupling and the Schaake shuffle-are widely applied in a second step to restore realistic inter-variable or spatio-temporal dependencies. The aim of this study is the multivariate post-processing of ensemble forecasts using a graph neural network (dualGNN) trained with a composite loss function that combines the energy score (ES) and the variogram score (VS). The method is evaluated on two datasets: WRF-based solar irradiance forecasts over northern Chile and ECMWF visibility forecasts for Central Europe. The dualGNN consistently outperforms all empirical copula-based post-processed forecasts and shows significant improvements compared to graph neural networks trained solely on either the continuous ranked probability score (CRPS) or the ES, according to the evaluated multivariate verification metrics. Furthermore, for the WRF forecasts, the rank-order structure of the dualGNN forecasts captures valuable dependency information, enabling a more effective restoration of spatial relationships than either the raw numerical weather prediction ensemble or historical observational rank structures. By contrast, for the visibility forecasts, the GNNs trained on CRPS, ES, or the ES-VS combination outperform the calibrated reference.