Archives AI News

StepFun AI Releases Step-Audio 2 Mini: An Open-Source 8B Speech-to-Speech AI Model that Surpasses GPT-4o-Audio

The StepFun AI team has released Step-Audio 2 Mini, an 8B parameter speech-to-speech large audio language model (LALM) that delivers expressive, grounded, and real-time audio interaction. Released under the Apache 2.0 license, this open-source model achieves state-of-the-art performance across speech recognition, audio understanding, and speech conversation benchmarks—surpassing commercial systems such as GPT-4o-Audio. Key Features 1. […] The post StepFun AI Releases Step-Audio 2 Mini: An Open-Source 8B Speech-to-Speech AI Model that Surpasses GPT-4o-Audio appeared first on MarkTechPost.

Quantum-inspired probability metrics define a complete, universal space for statistical learning

arXiv:2508.21086v1 Announce Type: new Abstract: Comparing probability distributions is a core challenge across the natural, social, and computational sciences. Existing methods, such as Maximum Mean Discrepancy (MMD), struggle in high-dimensional and non-compact domains. Here we introduce quantum probability metrics (QPMs), derived by embedding probability measures in the space of quantum states: positive, unit-trace operators on a Hilbert space. This construction extends kernel-based methods and overcomes the incompleteness of MMD on non-compact spaces. Viewed as an integral probability metric (IPM), QPMs have dual functions that uniformly approximate all bounded, uniformly continuous functions on $mathbb{R}^n$, offering enhanced sensitivity to subtle distributional differences in high dimensions. For empirical distributions, QPMs are readily calculated using eigenvalue methods, with analytic gradients suited for learning and optimization. Although computationally more intensive for large sample sizes ($O(n^3)$ vs. $O(n^2)$), QPMs can significantly improve performance as a drop-in replacement for MMD, as demonstrated in a classic generative modeling task. By combining the rich mathematical framework of quantum mechanics with classical probability theory, this approach lays the foundation for powerful tools to analyze and manipulate probability measures.

Convolutional Rectangular Attention Module

arXiv:2503.10875v2 Announce Type: replace-cross Abstract: In this paper, we introduce a novel spatial attention module that can be easily integrated to any convolutional network. This module guides the model to pay attention to the most discriminative part of an image. This enables the model to attain a better performance by an end-to-end training. In conventional approaches, a spatial attention map is typically generated in a position-wise manner. Thus, it is often resulting in irregular boundaries and so can hamper generalization to new samples. In our method, the attention region is constrained to be rectangular. This rectangle is parametrized by only 5 parameters, allowing for a better stability and generalization to new samples. In our experiments, our method systematically outperforms the position-wise counterpart. So that, we provide a novel useful spatial attention mechanism for convolutional models. Besides, our module also provides the interpretability regarding the textit{where to look} question, as it helps to know the part of the input on which the model focuses to produce the prediction.

Adaptive generative moment matching networks for improved learning of dependence structures

arXiv:2508.21531v1 Announce Type: new Abstract: An adaptive bandwidth selection procedure for the mixture kernel in the maximum mean discrepancy (MMD) for fitting generative moment matching networks (GMMNs) is introduced, and its ability to improve the learning of copula random number generators is demonstrated. Based on the relative error of the training loss, the number of kernels is increased during training; additionally, the relative error of the validation loss is used as an early stopping criterion. While training time of such adaptively trained GMMNs (AGMMNs) is similar to that of GMMNs, training performance is increased significantly in comparison to GMMNs, which is assessed and shown based on validation MMD trajectories, samples and validation MMD values. Superiority of AGMMNs over GMMNs, as well as typical parametric copula models, is demonstrated in terms of three applications. First, convergence rates of quasi-random versus pseudo-random samples from high-dimensional copulas are investigated for three functionals of interest and in dimensions as large as 100 for the first time. Second, replicated validation MMDs, as well as Monte Carlo and quasi-Monte Carlo applications based on the expected payoff of a basked call option and the risk measure expected shortfall as functionals are used to demonstrate the improved training of AGMMNs over GMMNs for a copula model fitted to the standardized residuals of the 50 constituents of the S&P 500 index after deGARCHing. Last, both the latter dataset and 50 constituents of the FTSE~100 are used to demonstrate that the improved training of AGMMNs over GMMNs and in comparison to the fitting of classical parametric copula models indeed also translates to an improved model prediction.

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

arXiv:2508.21184v1 Announce Type: cross Abstract: We propose a general-purpose approach for improving the ability of Large Language Models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to act as effective multi-turn conversational agents and interactively interface with external environments. Our approach, which we call BED-LLM (Bayesian Experimental Design with Large Language Models), is based on iteratively choosing questions or queries that maximize the expected information gain (EIG) about the task of interest given the responses gathered previously. We show how this EIG can be formulated in a principled way using a probabilistic model derived from the LLM's belief distribution and provide detailed insights into key decisions in its construction. Further key to the success of BED-LLM are a number of specific innovations, such as a carefully designed estimator for the EIG, not solely relying on in-context updates for conditioning on previous responses, and a targeted strategy for proposing candidate queries. We find that BED-LLM achieves substantial gains in performance across a wide range of tests based on the 20-questions game and using the LLM to actively infer user preferences, compared to direct prompting of the LLM and other adaptive design strategies.

Convergence of Stochastic Gradient Methods for Wide Two-Layer Physics-Informed Neural Networks

arXiv:2508.21571v1 Announce Type: cross Abstract: Physics informed neural networks (PINNs) represent a very popular class of neural solvers for partial differential equations. In practice, one often employs stochastic gradient descent type algorithms to train the neural network. Therefore, the convergence guarantee of stochastic gradient descent is of fundamental importance. In this work, we establish the linear convergence of stochastic gradient descent / flow in training over-parameterized two layer PINNs for a general class of activation functions in the sense of high probability. These results extend the existing result [18] in which gradient descent was analyzed. The challenge of the analysis lies in handling the dynamic randomness introduced by stochastic optimization methods. The key of the analysis lies in ensuring the positive definiteness of suitable Gram matrices during the training. The analysis sheds insight into the dynamics of the optimization process, and provides guarantees on the neural networks trained by stochastic algorithms.

Data-driven Discovery of Digital Twins in Biomedical Research

arXiv:2508.21484v1 Announce Type: cross Abstract: Recent technological advances have expanded the availability of high-throughput biological datasets, enabling the reliable design of digital twins of biomedical systems or patients. Such computational tools represent key reaction networks driving perturbation or drug response and can guide drug discovery and personalized therapeutics. Yet, their development still relies on laborious data integration by the human modeler, so that automated approaches are critically needed. The success of data-driven system discovery in Physics, rooted in clean datasets and well-defined governing laws, has fueled interest in applying similar techniques in Biology, which presents unique challenges. Here, we reviewed methodologies for automatically inferring digital twins from biological time series, which mostly involve symbolic or sparse regression. We evaluate algorithms according to eight biological and methodological challenges, associated to noisy/incomplete data, multiple conditions, prior knowledge integration, latent variables, high dimensionality, unobserved variable derivatives, candidate library design, and uncertainty quantification. Upon these criteria, sparse regression generally outperformed symbolic regression, particularly when using Bayesian frameworks. We further highlight the emerging role of deep learning and large language models, which enable innovative prior knowledge integration, though the reliability and consistency of such approaches must be improved. While no single method addresses all challenges, we argue that progress in learning digital twins will come from hybrid and modular frameworks combining chemical reaction network-based mechanistic grounding, Bayesian uncertainty quantification, and the generative and knowledge integration capacities of deep learning. To support their development, we further propose a benchmarking framework to evaluate methods across all challenges.

Learning covariate importance for matching in policy-relevant observational research

arXiv:2403.12367v2 Announce Type: replace Abstract: Matching methods are widely used to reduce confounding effects in observational studies, but conventional approaches often treat all covariates as equally important, which can result in poor performance when covariates differ in their relevance to the study. We propose the Priority-Aware one-to-one Matching Algorithm (PAMA), a novel semi-supervised framework that learns a covariate importance measure from a subset data of units that are paired by experts and uses it to match additional units. It optimizes a weighted quadratic score that reflects the relevance between each covariate and the study, and iteratively updates the covariate importance measure in the score function using unlabeled data. PAMA is model-free, but we have established that the covariate importance measure -- the learned weights -- is consistent when the oracle matching rule aligns with the design. In addition, we introduce extensions that address imbalanced data, accommodate temporal covariates, and improve robustness to mispaired observations. In simulations, PAMA outperforms standard methods, particularly in high-dimensional settings and under model misspecification. Applied to a real-world study of in-person schooling and COVID-19 transmission, PAMA recovers nearly twice as many expert-designated matches as competing methods using baseline covariates. A self-taught learning extension improves performance in simulations, though its benefit is context-dependent. To our knowledge, PAMA is the first framework to apply semi-supervised learning to observational matching with covariates of unequal relevance. It offers a scalable and interpretable tool for incorporating expert insight into policy-relevant observational research.

Guaranteed Nonconvex Factorization Approach for Tensor Train Recovery

arXiv:2401.02592v3 Announce Type: replace Abstract: In this paper, we provide the first convergence guarantee for the factorization approach. Specifically, to avoid the scaling ambiguity and to facilitate theoretical analysis, we optimize over the so-called left-orthogonal TT format which enforces orthonormality among most of the factors. To ensure the orthonormal structure, we utilize the Riemannian gradient descent (RGD) for optimizing those factors over the Stiefel manifold. We first delve into the TT factorization problem and establish the local linear convergence of RGD. Notably, the rate of convergence only experiences a linear decline as the tensor order increases. We then study the sensing problem that aims to recover a TT format tensor from linear measurements. Assuming the sensing operator satisfies the restricted isometry property (RIP), we show that with a proper initialization, which could be obtained through spectral initialization, RGD also converges to the ground-truth tensor at a linear rate. Furthermore, we expand our analysis to encompass scenarios involving Gaussian noise in the measurements. We prove that RGD can reliably recover the ground truth at a linear rate, with the recovery error exhibiting only polynomial growth in relation to the tensor order. We conduct various experiments to validate our theoretical findings.