Archives AI News

The Nondecreasing Rank

arXiv:2509.00265v1 Announce Type: new Abstract: In this article the notion of the nondecreasing (ND) rank of a matrix or tensor is introduced. A tensor has an ND rank of r if it can be represented as a sum of r outer products of vectors, with each vector satisfying a monotonicity constraint. It is shown that for certain poset orderings finding an ND factorization of rank $r$ is equivalent to finding a nonnegative rank-r factorization of a transformed tensor. However, not every tensor that is monotonic has a finite ND rank. Theory is developed describing the properties of the ND rank, including typical, maximum, and border ND ranks. Highlighted also are the special settings where a matrix or tensor has an ND rank of one or two. As a means of finding low ND rank approximations to a data tensor we introduce a variant of the hierarchical alternating least squares algorithm. Low ND rank factorizations are found and interpreted for two datasets concerning the weight of pigs and a mental health survey during the COVID-19 pandemic.

Probit Monotone BART

arXiv:2509.00263v1 Announce Type: new Abstract: Bayesian Additive Regression Trees (BART) of Chipman et al. (2010) has proven to be a powerful tool for nonparametric modeling and prediction. Monotone BART (Chipman et al., 2022) is a recent development that allows BART to be more precise in estimating monotonic functions. We further these developments by proposing probit monotone BART, which allows the monotone BART framework to estimate conditional mean functions when the outcome variable is binary.

Learning in complex action spaces without policy gradients

arXiv:2410.06317v2 Announce Type: replace-cross Abstract: While conventional wisdom holds that policy gradient methods are better suited to complex action spaces than action-value methods, foundational work has shown that the two paradigms are equivalent in small, finite action spaces (O'Donoghue et al., 2017; Schulman et al., 2017a). This raises the question of why their computational applicability and performance diverge as the complexity of the action space increases. We hypothesize that the apparent superiority of policy gradients in such settings stems not from intrinsic qualities of the paradigm but from universal principles that can also be applied to action-value methods, enabling similar functions. We identify three such principles and provide a framework for incorporating them into action-value methods. To support our hypothesis, we instantiate this framework in what we term QMLE, for Q-learning with maximum likelihood estimation. Our results show that QMLE can be applied to complex action spaces at a computational cost comparable to that of policy gradient methods, all without using policy gradients. Furthermore, QMLE exhibits strong performance on the DeepMind Control Suite, even when compared to state-of-the-art methods such as DMPO and D4PG.

Hybrid Topic-Semantic Labeling and Graph Embeddings for Unsupervised Legal Document Clustering

arXiv:2509.00990v1 Announce Type: new Abstract: Legal documents pose unique challenges for text classification due to their domain-specific language and often limited labeled data. This paper proposes a hybrid approach for classifying legal texts by combining unsupervised topic and graph embeddings with a supervised model. We employ Top2Vec to learn semantic document embeddings and automatically discover latent topics, and Node2Vec to capture structural relationships via a bipartite graph of legal documents. The embeddings are combined and clustered using KMeans, yielding coherent groupings of documents. Our computations on a legal document dataset demonstrate that the combined Top2Vec+Node2Vec approach improves clustering quality over text-only or graph-only embeddings. We conduct a sensitivity analysis of hyperparameters, such as the number of clusters and the dimensionality of the embeddings, and demonstrate that our method achieves competitive performance against baseline Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF) models. Key findings indicate that while the pipeline presents an innovative approach to unsupervised legal document analysis by combining semantic topic modeling with graph embedding techniques, its efficacy is contingent upon the quality of initial topic generation and the representational power of the chosen embedding models for specialized legal language. Strategic recommendations include the exploration of domain-specific embeddings, more comprehensive hyperparameter tuning for Node2Vec, dynamic determination of cluster numbers, and robust human-in-the-loop validation processes to enhance legal relevance and trustworthiness. The pipeline demonstrates potential for exploratory legal data analysis and as a precursor to supervised learning tasks but requires further refinement and domain-specific adaptation for practical legal applications.

Memory Capacity of Nonlinear Recurrent Networks: Is it Informative?

arXiv:2502.04832v2 Announce Type: replace-cross Abstract: The total memory capacity (MC) of linear recurrent neural networks (RNNs) has been proven to be equal to the rank of the corresponding Kalman controllability matrix, and it is almost surely maximal for connectivity and input weight matrices drawn from regular distributions. This fact questions the usefulness of this metric in distinguishing the performance of linear RNNs in the processing of stochastic signals. This work shows that the MC of random nonlinear RNNs yields arbitrary values within established upper and lower bounds depending exclusively on the scale of the input process. This confirms that the existing definition of MC in linear and nonlinear cases has no practical value.

Lipschitz-Guided Design of Interpolation Schedules in Generative Models

arXiv:2509.01629v1 Announce Type: new Abstract: We study the design of interpolation schedules in the stochastic interpolants framework for flow and diffusion-based generative models. We show that while all scalar interpolation schedules achieve identical statistical efficiency under Kullback-Leibler divergence in path space after optimal diffusion coefficient tuning, their numerical efficiency can differ substantially. This observation motivates focusing on numerical properties of the resulting drift fields rather than statistical criteria for schedule design. We propose averaged squared Lipschitzness minimization as a principled criterion for numerical optimization, providing an alternative to kinetic energy minimization used in optimal transport approaches. A transfer formula is derived that enables conversion between different schedules at inference time without retraining neural networks. For Gaussian distributions, our optimized schedules achieve exponential improvements in Lipschitz constants over standard linear schedules, while for Gaussian mixtures, they reduce mode collapse in few-step sampling. We also validate our approach on high-dimensional invariant distributions from stochastic Allen-Cahn equations and Navier-Stokes equations, demonstrating robust performance improvements across resolutions.

Solving dynamic portfolio selection problems via score-based diffusion models

arXiv:2507.09916v3 Announce Type: replace-cross Abstract: In this paper, we tackle the dynamic mean-variance portfolio selection problem in a {it model-free} manner, based on (generative) diffusion models. We propose using data sampled from the real model $mathbb P$ (which is unknown) with limited size to train a generative model $mathbb Q$ (from which we can easily and adequately sample). With adaptive training and sampling methods that are tailor-made for time series data, we obtain quantification bounds between $mathbb P$ and $mathbb Q$ in terms of the adapted Wasserstein metric $mathcal A W_2$. Importantly, the proposed adapted sampling method also facilitates {it conditional sampling}. In the second part of this paper, we provide the stability of the mean-variance portfolio optimization problems in $mathcal A W _2$. Then, combined with the error bounds and the stability result, we propose a policy gradient algorithm based on the generative environment, in which our innovative adapted sampling method provides approximate scenario generators. We illustrate the performance of our algorithm on both simulated and real data. For real data, the algorithm based on the generative environment produces portfolios that beat several important baselines, including the Markowitz portfolio, the equal weight (naive) portfolio, and S&P 500.

Preconditioned Regularized Wasserstein Proximal Sampling

arXiv:2509.01685v1 Announce Type: new Abstract: We consider sampling from a Gibbs distribution by evolving finitely many particles. We propose a preconditioned version of a recently proposed noise-free sampling method, governed by approximating the score function with the numerically tractable score of a regularized Wasserstein proximal operator. This is derived by a Cole--Hopf transformation on coupled anisotropic heat equations, yielding a kernel formulation for the preconditioned regularized Wasserstein proximal. The diffusion component of the proposed method is also interpreted as a modified self-attention block, as in transformer architectures. For quadratic potentials, we provide a discrete-time non-asymptotic convergence analysis and explicitly characterize the bias, which is dependent on regularization and independent of step-size. Experiments demonstrate acceleration and particle-level stability on various log-concave and non-log-concave toy examples to Bayesian total-variation regularized image deconvolution, and competitive/better performance on non-convex Bayesian neural network training when utilizing variable preconditioning matrices.

Gaming and Cooperation in Federated Learning: What Can Happen and How to Monitor It

arXiv:2509.02391v1 Announce Type: cross Abstract: The success of Federated Learning depends on the actions that participants take out of sight. We model Federated Learning not as a mere optimization task but as a strategic system entangled with rules and incentives. From this perspective, we present an analytical framework that makes it possible to clearly identify where behaviors that genuinely improve performance diverge from those that merely target metrics. We introduce two indices that respectively quantify behavioral incentives and collective performance loss, and we use them as the basis for consistently interpreting the impact of operational choices such as rule design, the level of information disclosure, evaluation methods, and aggregator switching. We further summarize thresholds, auto-switch rules, and early warning signals into a checklist that can be applied directly in practice, and we provide both a practical algorithm for allocating limited audit resources and a performance guarantee. Simulations conducted across diverse environments consistently validate the patterns predicted by our framework, and we release all procedures for full reproducibility. While our approach operates most strongly under several assumptions, combining periodic recalibration, randomization, and connectivity-based alarms enables robust application under the variability of real-world operations. We present both design principles and operational guidelines that lower the incentives for metric gaming while sustaining and expanding stable cooperation.

The Price of Sparsity: Sufficient Conditions for Sparse Recovery using Sparse and Sparsified Measurements

arXiv:2509.01809v1 Announce Type: new Abstract: We consider the problem of recovering the support of a sparse signal using noisy projections. While extensive work has been done on the dense measurement matrix setting, the sparse setting remains less explored. In this work, we establish sufficient conditions on the sample size for successful sparse recovery using sparse measurement matrices. Bringing together our result with previously known necessary conditions, we discover that, in the regime where $ds/p rightarrow +infty$, sparse recovery in the sparse setting exhibits a phase transition at an information-theoretic threshold of $n_{text{INF}}^{text{SP}} = Thetaleft(slogleft(p/sright)/logleft(ds/pright)right)$, where $p$ denotes the signal dimension, $s$ the number of non-zero components of the signal, and $d$ the expected number of non-zero components per row of measurement. This expression makes the price of sparsity explicit: restricting each measurement to $d$ non-zeros inflates the required sample size by a factor of $log{s}/logleft(ds/pright)$, revealing a precise trade-off between sampling complexity and measurement sparsity. Additionally, we examine the effect of sparsifying an originally dense measurement matrix on sparse signal recovery. We prove in the regime of $s = alpha p$ and $d = psi p$ with $alpha, psi in left(0,1right)$ and $psi$ small that a sample of size $n^{text{Sp-ified}}_{text{INF}} = Thetaleft(p / psi^2right)$ is sufficient for recovery, subject to a certain uniform integrability conjecture, the proof of which is work in progress.