Archives AI News

PerFairX: Is There a Balance Between Fairness and Personality in Large Language Model Recommendations?

arXiv:2509.08829v1 Announce Type: cross Abstract: The integration of Large Language Models (LLMs) into recommender systems has enabled zero-shot, personality-based personalization through prompt-based interactions, offering a new paradigm for user-centric recommendations. However, incorporating user personality traits via the OCEAN model highlights a critical tension between achieving psychological alignment and ensuring demographic fairness. To address this, we propose PerFairX, a unified evaluation framework designed to quantify the trade-offs between personalization and demographic equity in LLM-generated recommendations. Using neutral and personality-sensitive prompts across diverse user profiles, we benchmark two state-of-the-art LLMs, ChatGPT and DeepSeek, on movie (MovieLens 10M) and music (Last.fm 360K) datasets. Our results reveal that personality-aware prompting significantly improves alignment with individual traits but can exacerbate fairness disparities across demographic groups. Specifically, DeepSeek achieves stronger psychological fit but exhibits higher sensitivity to prompt variations, while ChatGPT delivers stable yet less personalized outputs. PerFairX provides a principled benchmark to guide the development of LLM-based recommender systems that are both equitable and psychologically informed, contributing to the creation of inclusive, user-centric AI applications in continual learning contexts.

Average Causal Effect Estimation in DAGs with Hidden Variables: Beyond Back-Door and Front-Door Criteria

arXiv:2409.03962v2 Announce Type: replace-cross Abstract: The identification theory for causal effects in directed acyclic graphs (DAGs) with hidden variables is well established, but methods for estimating and inferring functionals that extend beyond the g-formula remain underdeveloped. Previous studies have introduced semiparametric estimators for such functionals in a broad class of DAGs with hidden variables. While these estimators exhibit desirable statistical properties such as double robustness in certain cases, they also face significant limitations. Notably, they encounter substantial computational challenges, particularly involving density estimation and numerical integration for continuous variables, and their estimates may fall outside the parameter space of the target estimand. Additionally, the asymptotic properties of these estimators is underexplored, especially when integrating flexible statistical and machine learning models for nuisance functional estimations. This paper addresses these challenges by introducing novel one-step corrected plug-in and targeted minimum loss-based estimators of causal effects for a class of hidden variable DAGs that go beyond classical back-door and front-door criteria (known as the treatment primal fixability criterion in prior literature). These estimators leverage data-adaptive machine learning algorithms to minimize modeling assumptions while ensuring key statistical properties including double robustness, efficiency, boundedness within the target parameter space, and asymptotic linearity under $L^2(P)$-rate conditions for nuisance functional estimates that yield root-n consistent causal effect estimates. To ensure our estimation methods are accessible in practice, we provide the flexCausal package in R.

Iterative Methods for Full-Scale Gaussian Process Approximations for Large Spatial Data

arXiv:2405.14492v4 Announce Type: replace-cross Abstract: Gaussian processes are flexible probabilistic regression models which are widely used in statistics and machine learning. However, a drawback is their limited scalability to large data sets. To alleviate this, full-scale approximations (FSAs) combine predictive process methods and covariance tapering, thus approximating both global and local structures. We show how iterative methods can be used to reduce computational costs in calculating likelihoods, gradients, and predictive distributions with FSAs. In particular, we introduce a novel preconditioner and show theoretically and empirically that it accelerates the conjugate gradient method's convergence speed and mitigates its sensitivity with respect to the FSA parameters and the eigenvalue structure of the original covariance matrix, and we demonstrate empirically that it outperforms a state-of-the-art pivoted Cholesky preconditioner. Furthermore, we introduce an accurate and fast way to calculate predictive variances using stochastic simulation and iterative methods. In addition, we show how our newly proposed FITC preconditioner can also be used in iterative methods for Vecchia approximations. In our experiments, it outperforms existing state-of-the-art preconditioners for Vecchia approximations. All methods are implemented in a free C++ software library with high-level Python and R packages.

Towards Robust Influence Functions with Flat Validation Minima

arXiv:2505.19097v2 Announce Type: replace-cross Abstract: The Influence Function (IF) is a widely used technique for assessing the impact of individual training samples on model predictions. However, existing IF methods often fail to provide reliable influence estimates in deep neural networks, particularly when applied to noisy training data. This issue does not stem from inaccuracies in parameter change estimation, which has been the primary focus of prior research, but rather from deficiencies in loss change estimation, specifically due to the sharpness of validation risk. In this work, we establish a theoretical connection between influence estimation error, validation set risk, and its sharpness, underscoring the importance of flat validation minima for accurate influence estimation. Furthermore, we introduce a novel estimation form of Influence Function specifically designed for flat validation minima. Experimental results across various tasks validate the superiority of our approach.

Global Optimization of Stochastic Black-Box Functions with Arbitrary Noise Distributions using Wilson Score Kernel Density Estimation

arXiv:2509.09238v1 Announce Type: new Abstract: Many optimization problems in robotics involve the optimization of time-expensive black-box functions, such as those involving complex simulations or evaluation of real-world experiments. Furthermore, these functions are often stochastic as repeated experiments are subject to unmeasurable disturbances. Bayesian optimization can be used to optimize such methods in an efficient manner by deploying a probabilistic function estimator to estimate with a given confidence so that regions of the search space can be pruned away. Consequently, the success of the Bayesian optimization depends on the function estimator's ability to provide informative confidence bounds. Existing function estimators require many function evaluations to infer the underlying confidence or depend on modeling of the disturbances. In this paper, it is shown that the confidence bounds provided by the Wilson Score Kernel Density Estimator (WS-KDE) are applicable as excellent bounds to any stochastic function with an output confined to the closed interval [0;1] regardless of the distribution of the output. This finding opens up the use of WS-KDE for stable global optimization on a wider range of cost functions. The properties of WS-KDE in the context of Bayesian optimization are demonstrated in simulation and applied to the problem of automated trap design for vibrational part feeders.

Expressive Power of Deep Networks on Manifolds: Simultaneous Approximation

arXiv:2509.09362v1 Announce Type: cross Abstract: A key challenge in scientific machine learning is solving partial differential equations (PDEs) on complex domains, where the curved geometry complicates the approximation of functions and their derivatives required by differential operators. This paper establishes the first simultaneous approximation theory for deep neural networks on manifolds. We prove that a constant-depth $mathrm{ReLU}^{k-1}$ network with bounded weights--a property that plays a crucial role in controlling generalization error--can approximate any function in the Sobolev space $mathcal{W}_p^{k}(mathcal{M}^d)$ to an error of $varepsilon$ in the $mathcal{W}_p^{s}(mathcal{M}^d)$ norm, for $kgeq 3$ and $s<k$, using $mathcal{O}(varepsilon^{-d/(k-s)})$ nonzero parameters, a rate that overcomes the curse of dimensionality by depending only on the intrinsic dimension $d$. These results readily extend to functions in H"older-Zygmund spaces. We complement this result with a matching lower bound, proving our construction is nearly optimal by showing the required number of parameters matches up to a logarithmic factor. Our proof of the lower bound introduces novel estimates for the Vapnik-Chervonenkis dimension and pseudo-dimension of the network's high-order derivative classes. These complexity bounds provide a theoretical cornerstone for learning PDEs on manifolds involving derivatives. Our analysis reveals that the network architecture leverages a sparse structure to efficiently exploit the manifold's low-dimensional geometry.

Similarity-based Outlier Detection for Noisy Object Re-Identification Using Beta Mixtures

arXiv:2509.08926v1 Announce Type: cross Abstract: Object re-identification (Re-ID) methods are highly sensitive to label noise, which typically leads to significant performance degradation. We address this challenge by reframing Re-ID as a supervised image similarity task and adopting a Siamese network architecture trained to capture discriminative pairwise relationships. Central to our approach is a novel statistical outlier detection (OD) framework, termed Beta-SOD (Beta mixture Similarity-based Outlier Detection), which models the distribution of cosine similarities between embedding pairs using a two-component Beta distribution mixture model. We establish a novel identifiability result for mixtures of two Beta distributions, ensuring that our learning task is well-posed.The proposed OD step complements the Re-ID architecture combining binary cross-entropy, contrastive, and cosine embedding losses that jointly optimize feature-level similarity learning.We demonstrate the effectiveness of Beta-SOD in de-noising and Re-ID tasks for person Re-ID, on CUHK03 and Market-1501 datasets, and vehicle Re-ID, on VeRi-776 dataset. Our method shows superior performance compared to the state-of-the-art methods across various noise levels (10-30%), demonstrating both robustness and broad applicability in noisy Re-ID scenarios. The implementation of Beta-SOD is available at: https://github.com/waqar3411/Beta-SOD

Instance-Optimal Matrix Multiplicative Weight Update and Its Quantum Applications

arXiv:2509.08911v1 Announce Type: cross Abstract: The Matrix Multiplicative Weight Update (MMWU) is a seminal online learning algorithm with numerous applications. Applied to the matrix version of the Learning from Expert Advice (LEA) problem on the $d$-dimensional spectraplex, it is well known that MMWU achieves the minimax-optimal regret bound of $O(sqrt{Tlog d})$, where $T$ is the time horizon. In this paper, we present an improved algorithm achieving the instance-optimal regret bound of $O(sqrt{Tcdot S(X||d^{-1}I_d)})$, where $X$ is the comparator in the regret, $I_d$ is the identity matrix, and $S(cdot||cdot)$ denotes the quantum relative entropy. Furthermore, our algorithm has the same computational complexity as MMWU, indicating that the improvement in the regret bound is ``free''. Technically, we first develop a general potential-based framework for matrix LEA, with MMWU being its special case induced by the standard exponential potential. Then, the crux of our analysis is a new ``one-sided'' Jensen's trace inequality built on a Laplace transform technique, which allows the application of general potential functions beyond exponential to matrix LEA. Our algorithm is finally induced by an optimal potential function from the vector LEA problem, based on the imaginary error function. Complementing the above, we provide a memory lower bound for matrix LEA, and explore the applications of our algorithm in quantum learning theory. We show that it outperforms the state of the art for learning quantum states corrupted by depolarization noise, random quantum states, and Gibbs states. In addition, applying our algorithm to linearized convex losses enables predicting nonlinear quantum properties, such as purity, quantum virtual cooling, and R'{e}nyi-$2$ correlation.

Uncertainty Estimation using Variance-Gated Distributions

arXiv:2509.08846v1 Announce Type: cross Abstract: Evaluation of per-sample uncertainty quantification from neural networks is essential for decision-making involving high-risk applications. A common approach is to use the predictive distribution from Bayesian or approximation models and decompose the corresponding predictive uncertainty into epistemic (model-related) and aleatoric (data-related) components. However, additive decomposition has recently been questioned. In this work, we propose an intuitive framework for uncertainty estimation and decomposition based on the signal-to-noise ratio of class probability distributions across different model predictions. We introduce a variance-gated measure that scales predictions by a confidence factor derived from ensembles. We use this measure to discuss the existence of a collapse in the diversity of committee machines.

Low-degree lower bounds via almost orthonormal bases

arXiv:2509.09353v1 Announce Type: new Abstract: Low-degree polynomials have emerged as a powerful paradigm for providing evidence of statistical-computational gaps across a variety of high-dimensional statistical models [Wein25]. For detection problems -- where the goal is to test a planted distribution $mathbb{P}'$ against a null distribution $mathbb{P}$ with independent components -- the standard approach is to bound the advantage using an $mathbb{L}^2(mathbb{P})$-orthonormal family of polynomials. However, this method breaks down for estimation tasks or more complex testing problems where $mathbb{P}$ has some planted structures, so that no simple $mathbb{L}^2(mathbb{P})$-orthogonal polynomial family is available. To address this challenge, several technical workarounds have been proposed [SW22,SW25], though their implementation can be delicate. In this work, we propose a more direct proof strategy. Focusing on random graph models, we construct a basis of polynomials that is almost orthonormal under $mathbb{P}$, in precisely those regimes where statistical-computational gaps arise. This almost orthonormal basis not only yields a direct route to establishing low-degree lower bounds, but also allows us to explicitly identify the polynomials that optimize the low-degree criterion. This, in turn, provides insights into the design of optimal polynomial-time algorithms. We illustrate the effectiveness of our approach by recovering known low-degree lower bounds, and establishing new ones for problems such as hidden subcliques, stochastic block models, and seriation models.