Archives AI News

Toric geometry of ReLU neural networks

arXiv:2509.05894v1 Announce Type: cross Abstract: Given a continuous finitely piecewise linear function $f:mathbb{R}^{n_0} to mathbb{R}$ and a fixed architecture $(n_0,ldots,n_k;1)$ of feedforward ReLU neural networks, the exact function realization problem is to determine when some network with the given architecture realizes $f$. To develop a systematic way to answer these questions, we establish a connection between toric geometry and ReLU neural networks. This approach enables us to utilize numerous structures and tools from algebraic geometry to study ReLU neural networks. Starting with an unbiased ReLU neural network with rational weights, we define the ReLU fan, the ReLU toric variety, and the ReLU Cartier divisor associated with the network. This work also reveals the connection between the tropical geometry and the toric geometry of ReLU neural networks. As an application of the toric geometry framework, we prove a necessary and sufficient criterion of functions realizable by unbiased shallow ReLU neural networks by computing intersection numbers of the ReLU Cartier divisor and torus-invariant curves.

Asynchronous Gossip Algorithms for Rank-Based Statistical Methods

arXiv:2509.07543v1 Announce Type: new Abstract: As decentralized AI and edge intelligence become increasingly prevalent, ensuring robustness and trustworthiness in such distributed settings has become a critical issue-especially in the presence of corrupted or adversarial data. Traditional decentralized algorithms are vulnerable to data contamination as they typically rely on simple statistics (e.g., means or sum), motivating the need for more robust statistics. In line with recent work on decentralized estimation of trimmed means and ranks, we develop gossip algorithms for computing a broad class of rank-based statistics, including L-statistics and rank statistics-both known for their robustness to outliers. We apply our method to perform robust distributed two-sample hypothesis testing, introducing the first gossip algorithm for Wilcoxon rank-sum tests. We provide rigorous convergence guarantees, including the first convergence rate bound for asynchronous gossip-based rank estimation. We empirically validate our theoretical results through experiments on diverse network topologies.

Identifying Neural Signatures from fMRI using Hybrid Principal Components Regression

arXiv:2509.07300v1 Announce Type: new Abstract: Recent advances in neuroimaging analysis have enabled accurate decoding of mental state from brain activation patterns during functional magnetic resonance imaging scans. A commonly applied tool for this purpose is principal components regression regularized with the least absolute shrinkage and selection operator (LASSO PCR), a type of multi-voxel pattern analysis (MVPA). This model presumes that all components are equally likely to harbor relevant information, when in fact the task-related signal may be concentrated in specific components. In such cases, the model will fail to select the optimal set of principal components that maximizes the total signal relevant to the cognitive process under study. Here, we present modifications to LASSO PCR that allow for a regularization penalty tied directly to the index of the principal component, reflecting a prior belief that task-relevant signal is more likely to be concentrated in components explaining greater variance. Additionally, we propose a novel hybrid method, Joint Sparsity-Ranked LASSO (JSRL), which integrates component-level and voxel-level activity under an information parity framework and imposes ranked sparsity to guide component selection. We apply the models to brain activation during risk taking, monetary incentive, and emotion regulation tasks. Results demonstrate that incorporating sparsity ranking into LASSO PCR produces models with enhanced classification performance, with JSRL achieving up to 51.7% improvement in cross-validated deviance $R^2$ and 7.3% improvement in cross-validated AUC. Furthermore, sparsity-ranked models perform as well as or better than standard LASSO PCR approaches across all classification tasks and allocate predictive weight to brain regions consistent with their established functional roles, offering a robust alternative for MVPA.

Kernel VICReg for Self-Supervised Learning in Reproducing Kernel Hilbert Space

arXiv:2509.07289v1 Announce Type: new Abstract: Self-supervised learning (SSL) has emerged as a powerful paradigm for representation learning by optimizing geometric objectives--such as invariance to augmentations, variance preservation, and feature decorrelation--without requiring labels. However, most existing methods operate in Euclidean space, limiting their ability to capture nonlinear dependencies and geometric structures. In this work, we propose Kernel VICReg, a novel self-supervised learning framework that lifts the VICReg objective into a Reproducing Kernel Hilbert Space (RKHS). By kernelizing each term of the loss-variance, invariance, and covariance--we obtain a general formulation that operates on double-centered kernel matrices and Hilbert-Schmidt norms, enabling nonlinear feature learning without explicit mappings. We demonstrate that Kernel VICReg not only avoids representational collapse but also improves performance on tasks with complex or small-scale data. Empirical evaluations across MNIST, CIFAR-10, STL-10, TinyImageNet, and ImageNet100 show consistent gains over Euclidean VICReg, with particularly strong improvements on datasets where nonlinear structures are prominent. UMAP visualizations further confirm that kernel-based embeddings exhibit better isometry and class separation. Our results suggest that kernelizing SSL objectives is a promising direction for bridging classical kernel methods with modern representation learning.

Bayesian Pliable Lasso with Horseshoe Prior for Interaction Effects in GLMs with Missing Responses

arXiv:2509.07501v1 Announce Type: cross Abstract: Sparse regression problems, where the goal is to identify a small set of relevant predictors, often require modeling not only main effects but also meaningful interactions through other variables. While the pliable lasso has emerged as a powerful frequentist tool for modeling such interactions under strong heredity constraints, it lacks a natural framework for uncertainty quantification and incorporation of prior knowledge. In this paper, we propose a Bayesian pliable lasso that extends this approach by placing sparsity-inducing priors, such as the horseshoe, on both main and interaction effects. The hierarchical prior structure enforces heredity constraints while adaptively shrinking irrelevant coefficients and allowing important effects to persist. We extend this framework to Generalized Linear Models (GLMs) and develop a tailored approach to handle missing responses. To facilitate posterior inference, we develop an efficient Gibbs sampling algorithm based on a reparameterization of the horseshoe prior. Our Bayesian framework yields sparse, interpretable interaction structures, and principled measures of uncertainty. Through simulations and real-data studies, we demonstrate its advantages over existing methods in recovering complex interaction patterns under both complete and incomplete data. Our method is implemented in the package texttt{hspliable} available on Github.

uGMM-NN: Univariate Gaussian Mixture Model Neural Network

arXiv:2509.07569v1 Announce Type: cross Abstract: This paper introduces the Univariate Gaussian Mixture Model Neural Network (uGMM-NN), a novel neural architecture that embeds probabilistic reasoning directly into the computational units of deep networks. Unlike traditional neurons, which apply weighted sums followed by fixed nonlinearities, each uGMM-NN node parameterizes its activations as a univariate Gaussian mixture, with learnable means, variances, and mixing coefficients. This design enables richer representations by capturing multimodality and uncertainty at the level of individual neurons, while retaining the scalability of standard feedforward networks. We demonstrate that uGMM-NN can achieve competitive discriminative performance compared to conventional multilayer perceptrons, while additionally offering a probabilistic interpretation of activations. The proposed framework provides a foundation for integrating uncertainty-aware components into modern neural architectures, opening new directions for both discriminative and generative modeling.

On the Benefits of Public Representations for Private Transfer Learning under Distribution Shift

arXiv:2312.15551v5 Announce Type: replace-cross Abstract: Public pretraining is a promising approach to improve differentially private model training. However, recent work has noted that many positive research results studying this paradigm only consider in-distribution tasks, and may not apply to settings where there is distribution shift between the pretraining and finetuning data -- a scenario that is likely when finetuning private tasks due to the sensitive nature of the data. In this work, we show empirically across three tasks that even in settings with large distribution shift, where both zero-shot performance from public data and training from scratch with private data give unusably weak results, public features can in fact improve private training accuracy by up to 67% over private training from scratch. We provide a theoretical explanation for this phenomenon, showing that if the public and private data share a low-dimensional representation, public representations can improve the sample complexity of private training even if it is impossible to learn the private task from the public data alone. Altogether, our results provide evidence that public data can indeed make private training practical in realistic settings of extreme distribution shift.

Physics-informed low-rank neural operators with application to parametric elliptic PDEs

arXiv:2509.07687v1 Announce Type: cross Abstract: We present the Physics-Informed Low-Rank Neural Operator (PILNO), a neural operator framework for efficiently approximating solution operators of partial differential equations (PDEs) on point cloud data. PILNO combines low-rank kernel approximations with an encoder--decoder architecture, enabling fast, continuous one-shot predictions while remaining independent of specific discretizations. The model is trained using a physics-informed penalty framework, ensuring that PDE constraints and boundary conditions are satisfied in both supervised and unsupervised settings. We demonstrate its effectiveness on diverse problems, including function fitting, the Poisson equation, the screened Poisson equation with variable coefficients, and parameterized Darcy flow. The low-rank structure provides computational efficiency in high-dimensional parameter spaces, establishing PILNO as a scalable and flexible surrogate modeling tool for PDEs.

Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality

arXiv:2407.19618v3 Announce Type: replace-cross Abstract: Utilizing randomized experiments to evaluate the effect of short-term treatments on the short-term outcomes has been well understood and become the golden standard in industrial practice. However, as service systems become increasingly dynamical and personalized, much focus is shifting toward maximizing long-term outcomes, such as customer lifetime value, through lifetime exposure to interventions. Our goal is to assess the impact of treatment and control policies on long-term outcomes from relatively short-term observations, such as those generated by A/B testing. A key managerial observation is that many practical treatments are local, affecting only targeted states while leaving other parts of the policy unchanged. This paper rigorously investigates whether and how such locality can be exploited to improve estimation of long-term effects in Markov Decision Processes (MDPs), a fundamental model of dynamic systems. We first develop optimal inference techniques for general A/B testing in MDPs and establish corresponding efficiency bounds. We then propose methods to harness the localized structure by sharing information on the non-targeted states. Our new estimator can achieve a linear reduction with the number of test arms for a major part of the variance without sacrificing unbiasedness. It also matches a tighter variance lower bound that accounts for locality. Furthermore, we extend our framework to a broad class of differentiable estimators, which encompasses many widely used approaches in practice. We show that all such estimators can benefit from variance reduction through information sharing without increasing their bias. Together, these results provide both theoretical foundations and practical tools for conducting efficient experiments in dynamic service systems with local treatments.

Feature Understanding and Sparsity Enhancement via 2-Layered kernel machines (2L-FUSE)

arXiv:2509.07806v1 Announce Type: cross Abstract: We propose a novel sparsity enhancement strategy for regression tasks, based on learning a data-adaptive kernel metric, i.e., a shape matrix, through 2-Layered kernel machines. The resulting shape matrix, which defines a Mahalanobis-type deformation of the input space, is then factorized via an eigen-decomposition, allowing us to identify the most informative directions in the space of features. This data-driven approach provides a flexible, interpretable and accurate feature reduction scheme. Numerical experiments on synthetic and applications to real datasets of geomagnetic storms demonstrate that our approach achieves minimal yet highly informative feature sets without losing predictive performance.