Archives AI News

Kernel VICReg for Self-Supervised Learning in Reproducing Kernel Hilbert Space

arXiv:2509.07289v1 Announce Type: new Abstract: Self-supervised learning (SSL) has emerged as a powerful paradigm for representation learning by optimizing geometric objectives--such as invariance to augmentations, variance preservation, and feature decorrelation--without requiring labels. However, most existing methods operate in Euclidean space, limiting their ability to capture nonlinear dependencies and geometric structures. In this work, we propose Kernel VICReg, a novel self-supervised learning framework that lifts the VICReg objective into a Reproducing Kernel Hilbert Space (RKHS). By kernelizing each term of the loss-variance, invariance, and covariance--we obtain a general formulation that operates on double-centered kernel matrices and Hilbert-Schmidt norms, enabling nonlinear feature learning without explicit mappings. We demonstrate that Kernel VICReg not only avoids representational collapse but also improves performance on tasks with complex or small-scale data. Empirical evaluations across MNIST, CIFAR-10, STL-10, TinyImageNet, and ImageNet100 show consistent gains over Euclidean VICReg, with particularly strong improvements on datasets where nonlinear structures are prominent. UMAP visualizations further confirm that kernel-based embeddings exhibit better isometry and class separation. Our results suggest that kernelizing SSL objectives is a promising direction for bridging classical kernel methods with modern representation learning.

Bayesian Pliable Lasso with Horseshoe Prior for Interaction Effects in GLMs with Missing Responses

arXiv:2509.07501v1 Announce Type: cross Abstract: Sparse regression problems, where the goal is to identify a small set of relevant predictors, often require modeling not only main effects but also meaningful interactions through other variables. While the pliable lasso has emerged as a powerful frequentist tool for modeling such interactions under strong heredity constraints, it lacks a natural framework for uncertainty quantification and incorporation of prior knowledge. In this paper, we propose a Bayesian pliable lasso that extends this approach by placing sparsity-inducing priors, such as the horseshoe, on both main and interaction effects. The hierarchical prior structure enforces heredity constraints while adaptively shrinking irrelevant coefficients and allowing important effects to persist. We extend this framework to Generalized Linear Models (GLMs) and develop a tailored approach to handle missing responses. To facilitate posterior inference, we develop an efficient Gibbs sampling algorithm based on a reparameterization of the horseshoe prior. Our Bayesian framework yields sparse, interpretable interaction structures, and principled measures of uncertainty. Through simulations and real-data studies, we demonstrate its advantages over existing methods in recovering complex interaction patterns under both complete and incomplete data. Our method is implemented in the package texttt{hspliable} available on Github.

uGMM-NN: Univariate Gaussian Mixture Model Neural Network

arXiv:2509.07569v1 Announce Type: cross Abstract: This paper introduces the Univariate Gaussian Mixture Model Neural Network (uGMM-NN), a novel neural architecture that embeds probabilistic reasoning directly into the computational units of deep networks. Unlike traditional neurons, which apply weighted sums followed by fixed nonlinearities, each uGMM-NN node parameterizes its activations as a univariate Gaussian mixture, with learnable means, variances, and mixing coefficients. This design enables richer representations by capturing multimodality and uncertainty at the level of individual neurons, while retaining the scalability of standard feedforward networks. We demonstrate that uGMM-NN can achieve competitive discriminative performance compared to conventional multilayer perceptrons, while additionally offering a probabilistic interpretation of activations. The proposed framework provides a foundation for integrating uncertainty-aware components into modern neural architectures, opening new directions for both discriminative and generative modeling.

On the Benefits of Public Representations for Private Transfer Learning under Distribution Shift

arXiv:2312.15551v5 Announce Type: replace-cross Abstract: Public pretraining is a promising approach to improve differentially private model training. However, recent work has noted that many positive research results studying this paradigm only consider in-distribution tasks, and may not apply to settings where there is distribution shift between the pretraining and finetuning data -- a scenario that is likely when finetuning private tasks due to the sensitive nature of the data. In this work, we show empirically across three tasks that even in settings with large distribution shift, where both zero-shot performance from public data and training from scratch with private data give unusably weak results, public features can in fact improve private training accuracy by up to 67% over private training from scratch. We provide a theoretical explanation for this phenomenon, showing that if the public and private data share a low-dimensional representation, public representations can improve the sample complexity of private training even if it is impossible to learn the private task from the public data alone. Altogether, our results provide evidence that public data can indeed make private training practical in realistic settings of extreme distribution shift.

Physics-informed low-rank neural operators with application to parametric elliptic PDEs

arXiv:2509.07687v1 Announce Type: cross Abstract: We present the Physics-Informed Low-Rank Neural Operator (PILNO), a neural operator framework for efficiently approximating solution operators of partial differential equations (PDEs) on point cloud data. PILNO combines low-rank kernel approximations with an encoder--decoder architecture, enabling fast, continuous one-shot predictions while remaining independent of specific discretizations. The model is trained using a physics-informed penalty framework, ensuring that PDE constraints and boundary conditions are satisfied in both supervised and unsupervised settings. We demonstrate its effectiveness on diverse problems, including function fitting, the Poisson equation, the screened Poisson equation with variable coefficients, and parameterized Darcy flow. The low-rank structure provides computational efficiency in high-dimensional parameter spaces, establishing PILNO as a scalable and flexible surrogate modeling tool for PDEs.

Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality

arXiv:2407.19618v3 Announce Type: replace-cross Abstract: Utilizing randomized experiments to evaluate the effect of short-term treatments on the short-term outcomes has been well understood and become the golden standard in industrial practice. However, as service systems become increasingly dynamical and personalized, much focus is shifting toward maximizing long-term outcomes, such as customer lifetime value, through lifetime exposure to interventions. Our goal is to assess the impact of treatment and control policies on long-term outcomes from relatively short-term observations, such as those generated by A/B testing. A key managerial observation is that many practical treatments are local, affecting only targeted states while leaving other parts of the policy unchanged. This paper rigorously investigates whether and how such locality can be exploited to improve estimation of long-term effects in Markov Decision Processes (MDPs), a fundamental model of dynamic systems. We first develop optimal inference techniques for general A/B testing in MDPs and establish corresponding efficiency bounds. We then propose methods to harness the localized structure by sharing information on the non-targeted states. Our new estimator can achieve a linear reduction with the number of test arms for a major part of the variance without sacrificing unbiasedness. It also matches a tighter variance lower bound that accounts for locality. Furthermore, we extend our framework to a broad class of differentiable estimators, which encompasses many widely used approaches in practice. We show that all such estimators can benefit from variance reduction through information sharing without increasing their bias. Together, these results provide both theoretical foundations and practical tools for conducting efficient experiments in dynamic service systems with local treatments.

Feature Understanding and Sparsity Enhancement via 2-Layered kernel machines (2L-FUSE)

arXiv:2509.07806v1 Announce Type: cross Abstract: We propose a novel sparsity enhancement strategy for regression tasks, based on learning a data-adaptive kernel metric, i.e., a shape matrix, through 2-Layered kernel machines. The resulting shape matrix, which defines a Mahalanobis-type deformation of the input space, is then factorized via an eigen-decomposition, allowing us to identify the most informative directions in the space of features. This data-driven approach provides a flexible, interpretable and accurate feature reduction scheme. Numerical experiments on synthetic and applications to real datasets of geomagnetic storms demonstrate that our approach achieves minimal yet highly informative feature sets without losing predictive performance.

Expected Signature Kernels for L’evy Rough Paths

arXiv:2509.07893v1 Announce Type: cross Abstract: The expected signature kernel arises in statistical learning tasks as a similarity measure of probability measures on path space. Computing this kernel for known classes of stochastic processes is an important problem that, in particular, can help reduce computational costs. Building on the representation of the expected signature of (inhomogeneous) L'evy processes with absolutely continuous characteristics as the development of an absolutely continuous path in the extended tensor algebra [F.-H.-Tapia, Forum of Mathematics: Sigma (2022), "Unified signature cumulants and generalized Magnus expansions"], we extend the arguments developed for smooth rough paths in [Lemercier-Lyons-Salvi, "Log-PDE Methods for Rough Signature Kernels"] to derive a PDE system for the expected signature of inhomogeneous L'evy processes. As a specific example, we see that the expected signature kernel of Gaussian martingales satisfies a Goursat PDE.

Toward a Metrology for Artificial Intelligence: Hidden-Rule Environments and Reinforcement Learning

arXiv:2509.06213v2 Announce Type: replace-cross Abstract: We investigate reinforcement learning in the Game Of Hidden Rules (GOHR) environment, a complex puzzle in which an agent must infer and execute hidden rules to clear a 6$times$6 board by placing game pieces into buckets. We explore two state representation strategies, namely Feature-Centric (FC) and Object-Centric (OC), and employ a Transformer-based Advantage Actor-Critic (A2C) algorithm for training. The agent has access only to partial observations and must simultaneously infer the governing rule and learn the optimal policy through experience. We evaluate our models across multiple rule-based and trial-list-based experimental setups, analyzing transfer effects and the impact of representation on learning efficiency.

ADHAM: Additive Deep Hazard Analysis Mixtures for Interpretable Survival Regression

arXiv:2509.07108v1 Announce Type: new Abstract: Survival analysis is a fundamental tool for modeling time-to-event outcomes in healthcare. Recent advances have introduced flexible neural network approaches for improved predictive performance. However, most of these models do not provide interpretable insights into the association between exposures and the modeled outcomes, a critical requirement for decision-making in clinical practice. To address this limitation, we propose Additive Deep Hazard Analysis Mixtures (ADHAM), an interpretable additive survival model. ADHAM assumes a conditional latent structure that defines subgroups, each characterized by a combination of covariate-specific hazard functions. To select the number of subgroups, we introduce a post-training refinement that reduces the number of equivalent latent subgroups by merging similar groups. We perform comprehensive studies to demonstrate ADHAM's interpretability at the population, subgroup, and individual levels. Extensive experiments on real-world datasets show that ADHAM provides novel insights into the association between exposures and outcomes. Further, ADHAM remains on par with existing state-of-the-art survival baselines in terms of predictive performance, offering a scalable and interpretable approach to time-to-event prediction in healthcare.