Archives AI News

Discovering Heterogeneous Treatment Effects in Regression Discontinuity Designs

arXiv:2106.11640v4 Announce Type: replace-cross Abstract: The paper proposes a causal supervised machine learning algorithm to uncover treatment effect heterogeneity in sharp and fuzzy regression discontinuity (RD) designs. We develop a criterion for building an honest ``regression discontinuity tree'', where each leaf contains the RD estimate of a treatment conditional on the values of some pre-treatment covariates. It is a priori unknown which covariates are relevant for capturing treatment effect heterogeneity, and it is the task of the algorithm to discover them, without invalidating inference, while employing a nonparametric estimator with expected MSE optimal bandwidth. We study the performance of the method through Monte Carlo simulations and apply it to uncover various sources of heterogeneity in the impact of attending a better secondary school in Romania.

Weighted Support Points from Random Measures: An Interpretable Alternative for Generative Modeling

arXiv:2508.21255v1 Announce Type: new Abstract: Support points summarize a large dataset through a smaller set of representative points that can be used for data operations, such as Monte Carlo integration, without requiring access to the full dataset. In this sense, support points offer a compact yet informative representation of the original data. We build on this idea to introduce a generative modeling framework based on random weighted support points, where the randomness arises from a weighting scheme inspired by the Dirichlet process and the Bayesian bootstrap. The proposed method generates diverse and interpretable sample sets from a fixed dataset, without relying on probabilistic modeling assumptions or neural network architectures. We present the theoretical formulation of the method and develop an efficient optimization algorithm based on the Convex--Concave Procedure (CCP). Empirical results on the MNIST and CelebA-HQ datasets show that our approach produces high-quality and diverse outputs at a fraction of the computational cost of black-box alternatives such as Generative Adversarial Networks (GANs) or Denoising Diffusion Probabilistic Models (DDPMs). These results suggest that random weighted support points offer a principled, scalable, and interpretable alternative for generative modeling. A key feature is their ability to produce genuinely interpolative samples that preserve underlying data structure.

Bayesian Double Descent

arXiv:2507.07338v2 Announce Type: replace Abstract: Double descent is a phenomenon of over-parameterized statistical models. Our goal is to view double descent from a Bayesian perspective. Over-parameterized models such as deep neural networks have an interesting re-descending property in their risk characteristics. This is a recent phenomenon in machine learning and has been the subject of many studies. As the complexity of the model increases, there is a U-shaped region corresponding to the traditional bias-variance trade-off, but then as the number of parameters equals the number of observations and the model becomes one of interpolation, the risk can become infinite and then, in the over-parameterized region, it re-descends -- the double descent effect. We show that this has a natural Bayesian interpretation. Moreover, we show that it is not in conflict with the traditional Occam's razor that Bayesian models possess, in that they tend to prefer simpler models when possible. We develop comprehensive theoretical foundations including Dawid's model comparison theory, Dickey-Savage results, and connections to generalized ridge regression and shrinkage methods. We illustrate the approach with examples of Bayesian model selection in neural networks and provide detailed treatments of infinite Gaussian means models and non-parametric regression. Finally, we conclude with directions for future research.

Guaranteed Nonconvex Factorization Approach for Tensor Train Recovery

arXiv:2401.02592v3 Announce Type: replace Abstract: In this paper, we provide the first convergence guarantee for the factorization approach. Specifically, to avoid the scaling ambiguity and to facilitate theoretical analysis, we optimize over the so-called left-orthogonal TT format which enforces orthonormality among most of the factors. To ensure the orthonormal structure, we utilize the Riemannian gradient descent (RGD) for optimizing those factors over the Stiefel manifold. We first delve into the TT factorization problem and establish the local linear convergence of RGD. Notably, the rate of convergence only experiences a linear decline as the tensor order increases. We then study the sensing problem that aims to recover a TT format tensor from linear measurements. Assuming the sensing operator satisfies the restricted isometry property (RIP), we show that with a proper initialization, which could be obtained through spectral initialization, RGD also converges to the ground-truth tensor at a linear rate. Furthermore, we expand our analysis to encompass scenarios involving Gaussian noise in the measurements. We prove that RGD can reliably recover the ground truth at a linear rate, with the recovery error exhibiting only polynomial growth in relation to the tensor order. We conduct various experiments to validate our theoretical findings.

Learning covariate importance for matching in policy-relevant observational research

arXiv:2403.12367v2 Announce Type: replace Abstract: Matching methods are widely used to reduce confounding effects in observational studies, but conventional approaches often treat all covariates as equally important, which can result in poor performance when covariates differ in their relevance to the study. We propose the Priority-Aware one-to-one Matching Algorithm (PAMA), a novel semi-supervised framework that learns a covariate importance measure from a subset data of units that are paired by experts and uses it to match additional units. It optimizes a weighted quadratic score that reflects the relevance between each covariate and the study, and iteratively updates the covariate importance measure in the score function using unlabeled data. PAMA is model-free, but we have established that the covariate importance measure -- the learned weights -- is consistent when the oracle matching rule aligns with the design. In addition, we introduce extensions that address imbalanced data, accommodate temporal covariates, and improve robustness to mispaired observations. In simulations, PAMA outperforms standard methods, particularly in high-dimensional settings and under model misspecification. Applied to a real-world study of in-person schooling and COVID-19 transmission, PAMA recovers nearly twice as many expert-designated matches as competing methods using baseline covariates. A self-taught learning extension improves performance in simulations, though its benefit is context-dependent. To our knowledge, PAMA is the first framework to apply semi-supervised learning to observational matching with covariates of unequal relevance. It offers a scalable and interpretable tool for incorporating expert insight into policy-relevant observational research.

Data-driven Discovery of Digital Twins in Biomedical Research

arXiv:2508.21484v1 Announce Type: cross Abstract: Recent technological advances have expanded the availability of high-throughput biological datasets, enabling the reliable design of digital twins of biomedical systems or patients. Such computational tools represent key reaction networks driving perturbation or drug response and can guide drug discovery and personalized therapeutics. Yet, their development still relies on laborious data integration by the human modeler, so that automated approaches are critically needed. The success of data-driven system discovery in Physics, rooted in clean datasets and well-defined governing laws, has fueled interest in applying similar techniques in Biology, which presents unique challenges. Here, we reviewed methodologies for automatically inferring digital twins from biological time series, which mostly involve symbolic or sparse regression. We evaluate algorithms according to eight biological and methodological challenges, associated to noisy/incomplete data, multiple conditions, prior knowledge integration, latent variables, high dimensionality, unobserved variable derivatives, candidate library design, and uncertainty quantification. Upon these criteria, sparse regression generally outperformed symbolic regression, particularly when using Bayesian frameworks. We further highlight the emerging role of deep learning and large language models, which enable innovative prior knowledge integration, though the reliability and consistency of such approaches must be improved. While no single method addresses all challenges, we argue that progress in learning digital twins will come from hybrid and modular frameworks combining chemical reaction network-based mechanistic grounding, Bayesian uncertainty quantification, and the generative and knowledge integration capacities of deep learning. To support their development, we further propose a benchmarking framework to evaluate methods across all challenges.

Convergence of Stochastic Gradient Methods for Wide Two-Layer Physics-Informed Neural Networks

arXiv:2508.21571v1 Announce Type: cross Abstract: Physics informed neural networks (PINNs) represent a very popular class of neural solvers for partial differential equations. In practice, one often employs stochastic gradient descent type algorithms to train the neural network. Therefore, the convergence guarantee of stochastic gradient descent is of fundamental importance. In this work, we establish the linear convergence of stochastic gradient descent / flow in training over-parameterized two layer PINNs for a general class of activation functions in the sense of high probability. These results extend the existing result [18] in which gradient descent was analyzed. The challenge of the analysis lies in handling the dynamic randomness introduced by stochastic optimization methods. The key of the analysis lies in ensuring the positive definiteness of suitable Gram matrices during the training. The analysis sheds insight into the dynamics of the optimization process, and provides guarantees on the neural networks trained by stochastic algorithms.

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

arXiv:2508.21184v1 Announce Type: cross Abstract: We propose a general-purpose approach for improving the ability of Large Language Models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to act as effective multi-turn conversational agents and interactively interface with external environments. Our approach, which we call BED-LLM (Bayesian Experimental Design with Large Language Models), is based on iteratively choosing questions or queries that maximize the expected information gain (EIG) about the task of interest given the responses gathered previously. We show how this EIG can be formulated in a principled way using a probabilistic model derived from the LLM's belief distribution and provide detailed insights into key decisions in its construction. Further key to the success of BED-LLM are a number of specific innovations, such as a carefully designed estimator for the EIG, not solely relying on in-context updates for conditioning on previous responses, and a targeted strategy for proposing candidate queries. We find that BED-LLM achieves substantial gains in performance across a wide range of tests based on the 20-questions game and using the LLM to actively infer user preferences, compared to direct prompting of the LLM and other adaptive design strategies.

Adaptive generative moment matching networks for improved learning of dependence structures

arXiv:2508.21531v1 Announce Type: new Abstract: An adaptive bandwidth selection procedure for the mixture kernel in the maximum mean discrepancy (MMD) for fitting generative moment matching networks (GMMNs) is introduced, and its ability to improve the learning of copula random number generators is demonstrated. Based on the relative error of the training loss, the number of kernels is increased during training; additionally, the relative error of the validation loss is used as an early stopping criterion. While training time of such adaptively trained GMMNs (AGMMNs) is similar to that of GMMNs, training performance is increased significantly in comparison to GMMNs, which is assessed and shown based on validation MMD trajectories, samples and validation MMD values. Superiority of AGMMNs over GMMNs, as well as typical parametric copula models, is demonstrated in terms of three applications. First, convergence rates of quasi-random versus pseudo-random samples from high-dimensional copulas are investigated for three functionals of interest and in dimensions as large as 100 for the first time. Second, replicated validation MMDs, as well as Monte Carlo and quasi-Monte Carlo applications based on the expected payoff of a basked call option and the risk measure expected shortfall as functionals are used to demonstrate the improved training of AGMMNs over GMMNs for a copula model fitted to the standardized residuals of the 50 constituents of the S&P 500 index after deGARCHing. Last, both the latter dataset and 50 constituents of the FTSE~100 are used to demonstrate that the improved training of AGMMNs over GMMNs and in comparison to the fitting of classical parametric copula models indeed also translates to an improved model prediction.

Convolutional Rectangular Attention Module

arXiv:2503.10875v2 Announce Type: replace-cross Abstract: In this paper, we introduce a novel spatial attention module that can be easily integrated to any convolutional network. This module guides the model to pay attention to the most discriminative part of an image. This enables the model to attain a better performance by an end-to-end training. In conventional approaches, a spatial attention map is typically generated in a position-wise manner. Thus, it is often resulting in irregular boundaries and so can hamper generalization to new samples. In our method, the attention region is constrained to be rectangular. This rectangle is parametrized by only 5 parameters, allowing for a better stability and generalization to new samples. In our experiments, our method systematically outperforms the position-wise counterpart. So that, we provide a novel useful spatial attention mechanism for convolutional models. Besides, our module also provides the interpretability regarding the textit{where to look} question, as it helps to know the part of the input on which the model focuses to produce the prediction.