Archives AI News

The Next Layer: Augmenting Foundation Models with Structure-Preserving and Attention-Guided Learning for Local Patches to Global Context Awareness in Computational Pathology

arXiv:2508.19914v1 Announce Type: cross Abstract: Foundation models have recently emerged as powerful feature extractors in computational pathology, yet they typically omit mechanisms for leveraging the global spatial structure of tissues and the local contextual relationships among diagnostically relevant regions - key elements for understanding the tumor microenvironment. Multiple instance learning (MIL) remains an essential next step following foundation model, designing a framework to aggregate patch-level features into slide-level predictions. We present EAGLE-Net, a structure-preserving, attention-guided MIL architecture designed to augment prediction and interpretability. EAGLE-Net integrates multi-scale absolute spatial encoding to capture global tissue architecture, a top-K neighborhood-aware loss to focus attention on local microenvironments, and background suppression loss to minimize false positives. We benchmarked EAGLE-Net on large pan-cancer datasets, including three cancer types for classification (10,260 slides) and seven cancer types for survival prediction (4,172 slides), using three distinct histology foundation backbones (REMEDIES, Uni-V1, Uni2-h). Across tasks, EAGLE-Net achieved up to 3% higher classification accuracy and the top concordance indices in 6 of 7 cancer types, producing smooth, biologically coherent attention maps that aligned with expert annotations and highlighted invasive fronts, necrosis, and immune infiltration. These results position EAGLE-Net as a generalizable, interpretable framework that complements foundation models, enabling improved biomarker discovery, prognostic modeling, and clinical decision support

A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules

arXiv:2404.01245v4 Announce Type: replace-cross Abstract: Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical efficiency of watermarks and designing powerful detection rules. Inspired by the hypothesis testing formulation of watermark detection, our framework starts by selecting a pivotal statistic of the text and a secret key -- provided by the LLM to the verifier -- to enable controlling the false positive rate (the error of mistakenly detecting human-written text as LLM-generated). Next, this framework allows one to evaluate the power of watermark detection rules by obtaining a closed-form expression of the asymptotic false negative rate (the error of incorrectly classifying LLM-generated text as human-written). Our framework further reduces the problem of determining the optimal detection rule to solving a minimax optimization program. We apply this framework to two representative watermarks -- one of which has been internally implemented at OpenAI -- and obtain several findings that can be instrumental in guiding the practice of implementing watermarks. In particular, we derive optimal detection rules for these watermarks under our framework. These theoretically derived detection rules are demonstrated to be competitive and sometimes enjoy a higher power than existing detection approaches through numerical experiments.

Eigenvalue distribution of the Neural Tangent Kernel in the quadratic scaling

arXiv:2508.20036v1 Announce Type: cross Abstract: We compute the asymptotic eigenvalue distribution of the neural tangent kernel of a two-layer neural network under a specific scaling of dimension. Namely, if $Xinmathbb{R}^{ntimes d}$ is an i.i.d random matrix, $Winmathbb{R}^{dtimes p}$ is an i.i.d $mathcal{N}(0,1)$ matrix and $Dinmathbb{R}^{ptimes p}$ is a diagonal matrix with i.i.d bounded entries, we consider the matrix [ mathrm{NTK} = frac{1}{d}XX^top odot frac{1}{p} sigma'left( frac{1}{sqrt{d}}XW right)D^2 sigma'left( frac{1}{sqrt{d}}XW right)^top ] where $sigma'$ is a pseudo-Lipschitz function applied entrywise and under the scaling $frac{n}{dp}to gamma_1$ and $frac{p}{d}to gamma_2$. We describe the asymptotic distribution as the free multiplicative convolution of the Marchenko--Pastur distribution with a deterministic distribution depending on $sigma$ and $D$.

Robust Detection of Watermarks for Large Language Models Under Human Edits

arXiv:2411.13868v3 Announce Type: replace-cross Abstract: Watermarking has offered an effective approach to distinguishing text generated by large language models (LLMs) from human-written text. However, the pervasive presence of human edits on LLM-generated text dilutes watermark signals, thereby significantly degrading detection performance of existing methods. In this paper, by modeling human edits through mixture model detection, we introduce a new method in the form of a truncated goodness-of-fit test for detecting watermarked text under human edits, which we refer to as Tr-GoF. We prove that the Tr-GoF test achieves optimality in robust detection of the Gumbel-max watermark in a certain asymptotic regime of substantial text modifications and vanishing watermark signals. Importantly, Tr-GoF achieves this optimality textit{adaptively} as it does not require precise knowledge of human edit levels or probabilistic specifications of the LLMs, in contrast to the optimal but impractical (Neyman--Pearson) likelihood ratio test. Moreover, we establish that the Tr-GoF test attains the highest detection efficiency rate in a certain regime of moderate text modifications. In stark contrast, we show that sum-based detection rules, as employed by existing methods, fail to achieve optimal robustness in both regimes because the additive nature of their statistics is less resilient to edit-induced noise. Finally, we demonstrate the competitive and sometimes superior empirical performance of the Tr-GoF test on both synthetic data and open-source LLMs in the OPT and LLaMA families.

Neural Conditional Simulation for Complex Spatial Processes

arXiv:2508.20067v1 Announce Type: cross Abstract: A key objective in spatial statistics is to simulate from the distribution of a spatial process at a selection of unobserved locations conditional on observations (i.e., a predictive distribution) to enable spatial prediction and uncertainty quantification. However, exact conditional simulation from this predictive distribution is intractable or inefficient for many spatial process models. In this paper, we propose neural conditional simulation (NCS), a general method for spatial conditional simulation that is based on neural diffusion models. Specifically, using spatial masks, we implement a conditional score-based diffusion model that evolves Gaussian noise into samples from a predictive distribution when given a partially observed spatial field and spatial process parameters as inputs. The diffusion model relies on a neural network that only requires unconditional samples from the spatial process for training. Once trained, the diffusion model is amortized with respect to the observations in the partially observed field, the number and locations of those observations, and the spatial process parameters, and can therefore be used to conditionally simulate from a broad class of predictive distributions without retraining the neural network. We assess the NCS-generated simulations against simulations from the true conditional distribution of a Gaussian process model, and against Markov chain Monte Carlo (MCMC) simulations from a Brown--Resnick process model for spatial extremes. In the latter case, we show that it is more efficient and accurate to conditionally simulate using NCS than classical MCMC techniques implemented in standard software. We conclude that NCS enables efficient and accurate conditional simulation from spatial predictive distributions that are challenging to sample from using traditional methods.

Benchmarking Diffusion Annealing-Based Bayesian Inverse Problem Solvers

arXiv:2503.03007v2 Announce Type: replace-cross Abstract: In recent years, the ascendance of diffusion modeling as a state-of-the-art generative modeling approach has spurred significant interest in their use as priors in Bayesian inverse problems. However, it is unclear how to optimally integrate a diffusion model trained on the prior distribution with a given likelihood function to obtain posterior samples. While algorithms developed for this purpose can produce high-quality, diverse point estimates of the unknown parameters of interest, they are often tested on problems where the prior distribution is analytically unknown, making it difficult to assess their performance in providing rigorous uncertainty quantification. Motivated by this challenge, this work introduces three benchmark problems for evaluating the performance of diffusion model based samplers. The benchmark problems, which are inspired by problems in image inpainting, x-ray tomography, and phase retrieval, have a posterior density that is analytically known. In this setting, approximate ground-truth posterior samples can be obtained, enabling principled evaluation of the performance of posterior sampling algorithms. This work also introduces a general framework for diffusion model based posterior sampling, Bayesian Inverse Problem Solvers through Diffusion Annealing (BIPSDA). This framework unifies several recently proposed diffusion-model-based posterior sampling algorithms and contains novel algorithms that can be realized through flexible combinations of design choices. We tested the performance of a set of BIPSDA algorithms, including previously proposed state-of-the-art approaches, on the proposed benchmark problems. The results provide insight into the strengths and limitations of existing diffusion-model based posterior samplers, while the benchmark problems provide a testing ground for future algorithmic developments.

Deep Learning of Semi-Competing Risk Data via a New Neural Expectation-Maximization Algorithm

arXiv:2212.12028v2 Announce Type: replace Abstract: Prognostication for lung cancer, a leading cause of mortality, remains a complex task, as it needs to quantify the associations of risk factors and health events spanning a patient's entire life. One challenge is that an individual's disease course involves non-terminal (e.g., disease progression) and terminal (e.g., death) events, which form semi-competing relationships. Our motivation comes from the Boston Lung Cancer Study, a large lung cancer survival cohort, which investigates how risk factors influence a patient's disease trajectory. Following developments in the prediction of time-to-event outcomes with neural networks, deep learning has become a focal area for the development of risk prediction methods in survival analysis. However, limited work has been done to predict multi-state or semi-competing risk outcomes, where a patient may experience adverse events such as disease progression prior to death. We propose a novel neural expectation-maximization algorithm to bridge the gap between classical statistical approaches and machine learning. Our algorithm enables estimation of the non-parametric baseline hazards of each state transition, risk functions of predictors, and the degree of dependence among different transitions, via a multi-task deep neural network with transition-specific sub-architectures. We apply our method to the Boston Lung Cancer Study and investigate the impact of clinical and genetic predictors on disease progression and mortality.

BinConv: A Neural Architecture for Ordinal Encoding in Time-Series Forecasting

arXiv:2505.24595v3 Announce Type: replace-cross Abstract: Recent work in time series forecasting has explored reformulating regression as a classification task. By discretizing the continuous target space into bins and predicting over a fixed set of classes, these approaches benefit from more stable training, improved uncertainty modeling, and compatibility with modern deep learning architectures. However, most existing methods rely on one-hot encoding, which ignores the inherent ordinal structure of the target values. As a result, they fail to convey information about the relative distance between predicted and true values during training. In this paper, we address this limitation by applying textbf{Cumulative Binary Encoding} (CBE), a monotonic binary representation that transforms both model inputs and outputs. CBE implicitly preserves ordinal and magnitude information, allowing models to learn distance aware representations while operating within a classification framework. To leverage CBE effectively, we propose textbf{BinConv}, a fully convolutional neural network architecture designed for probabilistic forecasting. We demonstrate that standard fully connected layers are not only less computationally efficient than convolutional layers when used with CBE, but also degrade forecasting performance. Our experiments on standard benchmark datasets show that BinConv achieves superior performance compared to widely used baselines in both point and probabilistic forecasting, while requiring fewer parameters and enabling faster training.

Bayes-Optimal Fair Classification with Linear Disparity Constraints via Pre-, In-, and Post-processing

arXiv:2402.02817v3 Announce Type: replace Abstract: Machine learning algorithms may have disparate impacts on protected groups. To address this, we develop methods for Bayes-optimal fair classification, aiming to minimize classification error subject to given group fairness constraints. We introduce the notion of emph{linear disparity measures}, which are linear functions of a probabilistic classifier; and emph{bilinear disparity measures}, which are also linear in the group-wise regression functions. We show that several popular disparity measures -- the deviations from demographic parity, equality of opportunity, and predictive equality -- are bilinear. We find the form of Bayes-optimal fair classifiers under a single linear disparity measure, by uncovering a connection with the Neyman-Pearson lemma. For bilinear disparity measures, we are able to find the explicit form of Bayes-optimal fair classifiers as group-wise thresholding rules with explicitly characterized thresholds. We develop similar algorithms for when protected attribute cannot be used at the prediction phase. Moreover, we obtain analogous theoretical characterizations of optimal classifiers for a multi-class protected attribute and for equalized odds. Leveraging our theoretical results, we design methods that learn fair Bayes-optimal classifiers under bilinear disparity constraints. Our methods cover three popular approaches to fairness-aware classification, via pre-processing (Fair Up- and Down-Sampling), in-processing (Fair cost-sensitive Classification) and post-processing (a Fair Plug-In Rule). Our methods control disparity directly while achieving near-optimal fairness-accuracy tradeoffs. We show empirically that our methods have state-of-the-art performance compared to existing algorithms. In particular, our pre-processing method can a reach higher accuracy than prior pre-processing methods at low disparity levels.

Fractal Flow: Hierarchical and Interpretable Normalizing Flow via Topic Modeling and Recursive Strategy

arXiv:2508.19750v1 Announce Type: new Abstract: Normalizing Flows provide a principled framework for high-dimensional density estimation and generative modeling by constructing invertible transformations with tractable Jacobian determinants. We propose Fractal Flow, a novel normalizing flow architecture that enhances both expressiveness and interpretability through two key innovations. First, we integrate Kolmogorov-Arnold Networks and incorporate Latent Dirichlet Allocation into normalizing flows to construct a structured, interpretable latent space and model hierarchical semantic clusters. Second, inspired by Fractal Generative Models, we introduce a recursive modular design into normalizing flows to improve transformation interpretability and estimation accuracy. Experiments on MNIST, FashionMNIST, CIFAR-10, and geophysical data demonstrate that the Fractal Flow achieves latent clustering, controllable generation, and superior estimation accuracy.