Archives AI News

Rescaled Influence Functions: Accurate Data Attribution in High Dimension

arXiv:2506.06656v2 Announce Type: replace-cross Abstract: How does the training data affect a model's behavior? This is the question we seek to answer with data attribution. The leading practical approaches to data attribution are based on influence functions (IF). IFs utilize a first-order Taylor approximation to efficiently predict the effect of removing a set of samples from the training set without retraining the model, and are used in a wide variety of machine learning applications. However, especially in the high-dimensional regime (# params $geq Omega($# samples$)$), they are often imprecise and tend to underestimate the effect of sample removals, even for simple models such as logistic regression. We present rescaled influence functions (RIF), a new tool for data attribution which can be used as a drop-in replacement for influence functions, with little computational overhead but significant improvement in accuracy. We compare IF and RIF on a range of real-world datasets, showing that RIFs offer significantly better predictions in practice, and present a theoretical analysis explaining this improvement. Finally, we present a simple class of data poisoning attacks that would fool IF-based detections but would be detected by RIF.

Generative Example-Based Explanations: Bridging the Gap between Generative Modeling and Explainability

arXiv:2410.20890v2 Announce Type: replace-cross Abstract: Recently, several methods have leveraged deep generative modeling to produce example-based explanations of image classifiers. Despite producing visually stunning results, these methods are largely disconnected from classical explainability literature. This conceptual and communication gap leads to misunderstandings and misalignments in goals and expectations. In this paper, we bridge this gap by proposing a probabilistic framework for example-based explanations, formally defining the example-based explanations in a probabilistic manner amenable for modeling via deep generative models while coherent with the critical characteristics and desiderata widely accepted in the explainability community. Our aim is on one hand to provide a constructive framework for the development of well-grounded generative algorithms for example-based explanations and, on the other, to facilitate communication between the generative and explainability research communities, foster rigor and transparency, and improve the quality of peer discussion and research progress in this promising direction.

Bias in the Loop: How Humans Evaluate AI-Generated Suggestions

arXiv:2509.08514v1 Announce Type: cross Abstract: Human-AI collaboration increasingly drives decision-making across industries, from medical diagnosis to content moderation. While AI systems promise efficiency gains by providing automated suggestions for human review, these workflows can trigger cognitive biases that degrade performance. We know little about the psychological factors that determine when these collaborations succeed or fail. We conducted a randomized experiment with 2,784 participants to examine how task design and individual characteristics shape human responses to AI-generated suggestions. Using a controlled annotation task, we manipulated three factors: AI suggestion quality in the first three instances, task burden through required corrections, and performance-based financial incentives. We collected demographics, attitudes toward AI, and behavioral data to assess four performance metrics: accuracy, correction activity, overcorrection, and undercorrection. Two patterns emerged that challenge conventional assumptions about human-AI collaboration. First, requiring corrections for flagged AI errors reduced engagement and increased the tendency to accept incorrect suggestions, demonstrating how cognitive shortcuts influence collaborative outcomes. Second, individual attitudes toward AI emerged as the strongest predictor of performance, surpassing demographic factors. Participants skeptical of AI detected errors more reliably and achieved higher accuracy, while those favorable toward automation exhibited dangerous overreliance on algorithmic suggestions. The findings reveal that successful human-AI collaboration depends not only on algorithmic performance but also on who reviews AI outputs and how review processes are structured. Effective human-AI collaborations require consideration of human psychology: selecting diverse evaluator samples, measuring attitudes, and designing workflows that counteract cognitive biases.

A transport approach to the cutoff phenomenon

arXiv:2509.08560v1 Announce Type: cross Abstract: Substantial progress has recently been made in the understanding of the cutoff phenomenon for Markov processes, using an information-theoretic statistics known as varentropy [Sal23; Sal24; Sal25a; PS25]. In the present paper, we propose an alternative approach which bypasses the use of varentropy and exploits instead a new W-TV transport inequality, combined with a classical parabolic regularization estimate [BGL01; OV01]. While currently restricted to non-negatively curved processes on smooth spaces, our argument no longer requires the chain rule, nor any approximate version thereof. As applications, we recover the main result of [Sal25a] establishing cutoff for the log-concave Langevin dynamics, and extend the conclusion to a widely-used discrete-time sampling algorithm known as the Proximal Sampler.

Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis

arXiv:2509.08483v1 Announce Type: cross Abstract: We analyze gradient descent with Polyak heavy-ball momentum (HB) whose fixed momentum parameter $beta in (0, 1)$ provides exponential decay of memory. Building on Kovachki and Stuart (2021), we prove that on an exponentially attractive invariant manifold the algorithm is exactly plain gradient descent with a modified loss, provided that the step size $h$ is small enough. Although the modified loss does not admit a closed-form expression, we describe it with arbitrary precision and prove global (finite "time" horizon) approximation bounds $O(h^{R})$ for any finite order $R geq 2$. We then conduct a fine-grained analysis of the combinatorics underlying the memoryless approximations of HB, in particular, finding a rich family of polynomials in $beta$ hidden inside which contains Eulerian and Narayana polynomials. We derive continuous modified equations of arbitrary approximation order (with rigorous bounds) and the principal flow that approximates the HB dynamics, generalizing Rosca et al. (2023). Approximation theorems cover both full-batch and mini-batch HB. Our theoretical results shed new light on the main features of gradient descent with heavy-ball momentum, and outline a road-map for similar analysis of other optimization algorithms.

A hierarchical entropy method for the delocalization of bias in high-dimensional Langevin Monte Carlo

arXiv:2509.08619v1 Announce Type: new Abstract: The unadjusted Langevin algorithm is widely used for sampling from complex high-dimensional distributions. It is well known to be biased, with the bias typically scaling linearly with the dimension when measured in squared Wasserstein distance. However, the recent paper of Chen et al. (2024) identifies an intriguing new delocalization effect: For a class of distributions with sparse interactions, the bias between low-dimensional marginals scales only with the lower dimension, not the full dimension. In this work, we strengthen the results of Chen et al. (2024) in the sparse interaction regime by removing a logarithmic factor, measuring distance in relative entropy (a.k.a. KL-divergence), and relaxing the strong log-concavity assumption. In addition, we expand the scope of the delocalization phenomenon by showing that it holds for a class of distributions with weak interactions. Our proofs are based on a hierarchical analysis of the marginal relative entropies, inspired by the authors' recent work on propagation of chaos.

Accelerating Hamiltonian Monte Carlo for Bayesian Inference in Neural Networks and Neural Operators

arXiv:2507.14652v2 Announce Type: replace Abstract: Hamiltonian Monte Carlo (HMC) is a powerful and accurate method to sample from the posterior distribution in Bayesian inference. However, HMC techniques are computationally demanding for Bayesian neural networks due to the high dimensionality of the network's parameter space and the non-convexity of their posterior distributions. Therefore, various approximation techniques, such as variational inference (VI) or stochastic gradient MCMC, are often employed to infer the posterior distribution of the network parameters. Such approximations introduce inaccuracies in the inferred distributions, resulting in unreliable uncertainty estimates. In this work, we propose a hybrid approach that combines inexpensive VI and accurate HMC methods to efficiently and accurately quantify uncertainties in neural networks and neural operators. The proposed approach leverages an initial VI training on the full network. We examine the influence of individual parameters on the prediction uncertainty, which shows that a large proportion of the parameters do not contribute substantially to uncertainty in the network predictions. This information is then used to significantly reduce the dimension of the parameter space, and HMC is performed only for the subset of network parameters that strongly influence prediction uncertainties. This yields a framework for accelerating the full batch HMC for posterior inference in neural networks. We demonstrate the efficiency and accuracy of the proposed framework on deep neural networks and operator networks, showing that inference can be performed for large networks with tens to hundreds of thousands of parameters. We show that this method can effectively learn surrogates for complex physical systems by modeling the operator that maps from upstream conditions to wall-pressure data on a cone in hypersonic flow.

Data-driven generative simulation of SDEs using diffusion models

arXiv:2509.08731v1 Announce Type: cross Abstract: This paper introduces a new approach to generating sample paths of unknown stochastic differential equations (SDEs) using diffusion models, a class of generative AI models commonly employed in image and video applications. Unlike the traditional Monte Carlo methods for simulating SDEs, which require explicit specifications of the drift and diffusion coefficients, our method takes a model-free, data-driven approach. Given a finite set of sample paths from an SDE, we utilize conditional diffusion models to generate new, synthetic paths of the same SDE. To demonstrate the effectiveness of our approach, we conduct a simulation experiment to compare our method with alternative benchmark ones including neural SDEs. Furthermore, in an empirical study we leverage these synthetically generated sample paths to enhance the performance of reinforcement learning algorithms for continuous-time mean-variance portfolio selection, hinting promising applications of diffusion models in financial analysis and decision-making.

On the Sample Complexity of Set Membership Estimation for Linear Systems with Disturbances Bounded by Convex Sets

arXiv:2406.00574v3 Announce Type: replace-cross Abstract: This paper revisits the set membership identification for linear control systems and establishes its convergence rates under relaxed assumptions on (i) the persistent excitation requirement and (ii) the system disturbances. In particular, instead of assuming persistent excitation exactly, this paper adopts the block-martingale small-ball condition enabled by randomly perturbed control policies to establish the convergence rates of SME with high probability. Further, we relax the assumptions on the shape of the bounded disturbance set and the boundary-visiting condition. Our convergence rates hold for disturbances bounded by general convex sets, which bridges the gap between the previous convergence analysis for general convex sets and the existing convergence rate analysis for $ell_infty$ balls. Further, we validate our convergence rates by several numerical experiments. This manuscript contains supplementary content in the Appendix.

Identification and Estimation of Simultaneous Equation Models Using Higher-Order Cumulant Restrictions

arXiv:2501.06777v2 Announce Type: replace-cross Abstract: Identifying structural parameters in linear simultaneous-equation models is a longstanding challenge. Recent work exploits information in higher-order moments of non-Gaussian data. In this literature, the structural errors are typically assumed to be uncorrelated so that, after standardizing the covariance matrix of the observables (whitening), the structural parameter matrix becomes orthogonal -- a device that underpins many identification proofs but can be restrictive in econometric applications. We show that neither zero covariance nor whitening is necessary. For any order $h>2$, a simple diagonality condition on the $h$th-order cumulants alone identifies the structural parameter matrix -- up to unknown scaling and permutation -- as the solution to an eigenvector problem; no restrictions on cumulants of other orders are required. This general, single-order result enlarges the class of models covered by our framework and yields a sample-analogue estimator that is $sqrt{n}$-consistent, asymptotically normal, and easy to compute. Furthermore, when uncorrelatedness is intrinsic -- as in vector autoregressive (VAR) models -- our framework provides a transparent overidentification test. Monte Carlo experiments show favorable finite-sample performance, and two applications -- "Returns to Schooling" and "Uncertainty and the Business Cycle" -- demonstrate its practical value.