Archives AI News

Solving dynamic portfolio selection problems via score-based diffusion models

arXiv:2507.09916v3 Announce Type: replace-cross Abstract: In this paper, we tackle the dynamic mean-variance portfolio selection problem in a {it model-free} manner, based on (generative) diffusion models. We propose using data sampled from the real model $mathbb P$ (which is unknown) with limited size to train a generative model $mathbb Q$ (from which we can easily and adequately sample). With adaptive training and sampling methods that are tailor-made for time series data, we obtain quantification bounds between $mathbb P$ and $mathbb Q$ in terms of the adapted Wasserstein metric $mathcal A W_2$. Importantly, the proposed adapted sampling method also facilitates {it conditional sampling}. In the second part of this paper, we provide the stability of the mean-variance portfolio optimization problems in $mathcal A W _2$. Then, combined with the error bounds and the stability result, we propose a policy gradient algorithm based on the generative environment, in which our innovative adapted sampling method provides approximate scenario generators. We illustrate the performance of our algorithm on both simulated and real data. For real data, the algorithm based on the generative environment produces portfolios that beat several important baselines, including the Markowitz portfolio, the equal weight (naive) portfolio, and S&P 500.

Lipschitz-Guided Design of Interpolation Schedules in Generative Models

arXiv:2509.01629v1 Announce Type: new Abstract: We study the design of interpolation schedules in the stochastic interpolants framework for flow and diffusion-based generative models. We show that while all scalar interpolation schedules achieve identical statistical efficiency under Kullback-Leibler divergence in path space after optimal diffusion coefficient tuning, their numerical efficiency can differ substantially. This observation motivates focusing on numerical properties of the resulting drift fields rather than statistical criteria for schedule design. We propose averaged squared Lipschitzness minimization as a principled criterion for numerical optimization, providing an alternative to kinetic energy minimization used in optimal transport approaches. A transfer formula is derived that enables conversion between different schedules at inference time without retraining neural networks. For Gaussian distributions, our optimized schedules achieve exponential improvements in Lipschitz constants over standard linear schedules, while for Gaussian mixtures, they reduce mode collapse in few-step sampling. We also validate our approach on high-dimensional invariant distributions from stochastic Allen-Cahn equations and Navier-Stokes equations, demonstrating robust performance improvements across resolutions.

Memory Capacity of Nonlinear Recurrent Networks: Is it Informative?

arXiv:2502.04832v2 Announce Type: replace-cross Abstract: The total memory capacity (MC) of linear recurrent neural networks (RNNs) has been proven to be equal to the rank of the corresponding Kalman controllability matrix, and it is almost surely maximal for connectivity and input weight matrices drawn from regular distributions. This fact questions the usefulness of this metric in distinguishing the performance of linear RNNs in the processing of stochastic signals. This work shows that the MC of random nonlinear RNNs yields arbitrary values within established upper and lower bounds depending exclusively on the scale of the input process. This confirms that the existing definition of MC in linear and nonlinear cases has no practical value.

Hybrid Topic-Semantic Labeling and Graph Embeddings for Unsupervised Legal Document Clustering

arXiv:2509.00990v1 Announce Type: new Abstract: Legal documents pose unique challenges for text classification due to their domain-specific language and often limited labeled data. This paper proposes a hybrid approach for classifying legal texts by combining unsupervised topic and graph embeddings with a supervised model. We employ Top2Vec to learn semantic document embeddings and automatically discover latent topics, and Node2Vec to capture structural relationships via a bipartite graph of legal documents. The embeddings are combined and clustered using KMeans, yielding coherent groupings of documents. Our computations on a legal document dataset demonstrate that the combined Top2Vec+Node2Vec approach improves clustering quality over text-only or graph-only embeddings. We conduct a sensitivity analysis of hyperparameters, such as the number of clusters and the dimensionality of the embeddings, and demonstrate that our method achieves competitive performance against baseline Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF) models. Key findings indicate that while the pipeline presents an innovative approach to unsupervised legal document analysis by combining semantic topic modeling with graph embedding techniques, its efficacy is contingent upon the quality of initial topic generation and the representational power of the chosen embedding models for specialized legal language. Strategic recommendations include the exploration of domain-specific embeddings, more comprehensive hyperparameter tuning for Node2Vec, dynamic determination of cluster numbers, and robust human-in-the-loop validation processes to enhance legal relevance and trustworthiness. The pipeline demonstrates potential for exploratory legal data analysis and as a precursor to supervised learning tasks but requires further refinement and domain-specific adaptation for practical legal applications.

Learning in complex action spaces without policy gradients

arXiv:2410.06317v2 Announce Type: replace-cross Abstract: While conventional wisdom holds that policy gradient methods are better suited to complex action spaces than action-value methods, foundational work has shown that the two paradigms are equivalent in small, finite action spaces (O'Donoghue et al., 2017; Schulman et al., 2017a). This raises the question of why their computational applicability and performance diverge as the complexity of the action space increases. We hypothesize that the apparent superiority of policy gradients in such settings stems not from intrinsic qualities of the paradigm but from universal principles that can also be applied to action-value methods, enabling similar functions. We identify three such principles and provide a framework for incorporating them into action-value methods. To support our hypothesis, we instantiate this framework in what we term QMLE, for Q-learning with maximum likelihood estimation. Our results show that QMLE can be applied to complex action spaces at a computational cost comparable to that of policy gradient methods, all without using policy gradients. Furthermore, QMLE exhibits strong performance on the DeepMind Control Suite, even when compared to state-of-the-art methods such as DMPO and D4PG.

Probit Monotone BART

arXiv:2509.00263v1 Announce Type: new Abstract: Bayesian Additive Regression Trees (BART) of Chipman et al. (2010) has proven to be a powerful tool for nonparametric modeling and prediction. Monotone BART (Chipman et al., 2022) is a recent development that allows BART to be more precise in estimating monotonic functions. We further these developments by proposing probit monotone BART, which allows the monotone BART framework to estimate conditional mean functions when the outcome variable is binary.

The Nondecreasing Rank

arXiv:2509.00265v1 Announce Type: new Abstract: In this article the notion of the nondecreasing (ND) rank of a matrix or tensor is introduced. A tensor has an ND rank of r if it can be represented as a sum of r outer products of vectors, with each vector satisfying a monotonicity constraint. It is shown that for certain poset orderings finding an ND factorization of rank $r$ is equivalent to finding a nonnegative rank-r factorization of a transformed tensor. However, not every tensor that is monotonic has a finite ND rank. Theory is developed describing the properties of the ND rank, including typical, maximum, and border ND ranks. Highlighted also are the special settings where a matrix or tensor has an ND rank of one or two. As a means of finding low ND rank approximations to a data tensor we introduce a variant of the hierarchical alternating least squares algorithm. Low ND rank factorizations are found and interpreted for two datasets concerning the weight of pigs and a mental health survey during the COVID-19 pandemic.

Partial Functional Dynamic Backdoor Diffusion-based Causal Model

arXiv:2509.00472v1 Announce Type: new Abstract: We introduce a Partial Functional Dynamic Backdoor Diffusion-based Causal Model (PFD-BDCM), specifically designed for causal inference in the presence of unmeasured confounders with spatial heterogeneity and temporal dependency. The proposed PFD-BDCM framework addresses the restrictions of the existing approaches by uniquely integrating models for complex spatio-temporal dynamics with the analysis of multi-resolution variables. Specifically, the framework systematically mitigates confounding bias by integrating valid backdoor adjustment sets into a diffusion-based sampling mechanism. Moreover, it accounts for the intricate dynamics of unmeasured confounders through the deployment of region-specific structural equations and conditional autoregressive processes, and accommodates variables observed at heterogeneous resolutions via basis expansions for functional data. Our theoretical analysis establishes error bounds for counterfactual estimates of PFD-BDCM, formally linking reconstruction accuracy to counterfactual fidelity under monotonicity assumptions of structural equation and invertibility assumptions of encoding function. Empirical evaluations on synthetic datasets and real-world air pollution data demonstrate PFD-BDCM's superiority over existing methods.

Beyond Universal Approximation Theorems: Algorithmic Uniform Approximation by Neural Networks Trained with Noisy Data

arXiv:2509.00924v1 Announce Type: new Abstract: At its core, machine learning seeks to train models that reliably generalize beyond noisy observations; however, the theoretical vacuum in which state-of-the-art universal approximation theorems (UATs) operate isolates them from this goal, as they assume noiseless data and allow network parameters to be chosen freely, independent of algorithmic realism. This paper bridges that gap by introducing an architecture-specific randomized training algorithm that constructs a uniform approximator from $N$ noisy training samples on the $d$-dimensional cube $[0,1]^d$. Our trained neural networks attain the minimax-optimal quantity of textit{trainable} (non-random) parameters, subject to logarithmic factors which vanish under the idealized noiseless sampling assumed in classical UATs. Additionally, our trained models replicate key behaviours of real-world neural networks, absent in standard UAT constructions, by: (1) exhibiting sub-linear parametric complexity when fine-tuning on structurally related and favourable out-of-distribution tasks, (2) exactly interpolating the training data, and (3) maintaining reasonable Lipschitz regularity (after the initial clustering attention layer). These properties bring state-of-the-art UATs closer to practical machine learning, shifting the central open question from algorithmic implementability with noisy samples to whether stochastic gradient descent can achieve comparable guarantees.

Identifying Causal Direction via Dense Functional Classes

arXiv:2509.00538v1 Announce Type: new Abstract: We address the problem of determining the causal direction between two univariate, continuous-valued variables, X and Y, under the assumption of no hidden confounders. In general, it is not possible to make definitive statements about causality without some assumptions on the underlying model. To distinguish between cause and effect, we propose a bivariate causal score based on the Minimum Description Length (MDL) principle, using functions that possess the density property on a compact real interval. We prove the identifiability of these causal scores under specific conditions. These conditions can be easily tested. Gaussianity of the noise in the causal model equations is not assumed, only that the noise is low. The well-studied class of cubic splines possesses the density property on a compact real interval. We propose LCUBE as an instantiation of the MDL-based causal score utilizing cubic regression splines. LCUBE is an identifiable method that is also interpretable, simple, and very fast. It has only one hyperparameter. Empirical evaluations compared to state-of-the-art methods demonstrate that LCUBE achieves superior precision in terms of AUDRC on the real-world Tuebingen cause-effect pairs dataset. It also shows superior average precision across common 10 benchmark datasets and achieves above average precision on 13 datasets.