Archives AI News

High-Order Tensor Regression in Sparse Convolutional Neural Networks

arXiv:2501.01239v5 Announce Type: replace Abstract: This article presents a generic approach to convolution that significantly differs from conventional methodologies in the current Machine Learning literature. The approach, in its mathematical aspects, proved to be clear and concise, particularly when high-order tensors are involved. In this context, a rational theory of regression in neural networks is developed, as a framework for a generic view of sparse convolutional neural networks, the primary focus of this study. As a direct outcome, the classic Backpropagation Algorithm is redefined to align with this rational tensor-based approach and presented in its simplest, most generic form.

Latent Variable Modeling for Robust Causal Effect Estimation

arXiv:2508.20259v1 Announce Type: new Abstract: Latent variable models provide a powerful framework for incorporating and inferring unobserved factors in observational data. In causal inference, they help account for hidden factors influencing treatment or outcome, thereby addressing challenges posed by missing or unmeasured covariates. This paper proposes a new framework that integrates latent variable modeling into the double machine learning (DML) paradigm to enable robust causal effect estimation in the presence of such hidden factors. We consider two scenarios: one where a latent variable affects only the outcome, and another where it may influence both treatment and outcome. To ensure tractability, we incorporate latent variables only in the second stage of DML, separating representation learning from latent inference. We demonstrate the robustness and effectiveness of our method through extensive experiments on both synthetic and real-world datasets.

Improving Quantization with Post-Training Model Expansion

arXiv:2503.17513v2 Announce Type: replace Abstract: The size of a model has been a strong predictor of its quality, as well as its cost. As such, the trade-off between model cost and quality has been well-studied. Post-training optimizations like quantization and pruning have typically focused on reducing the overall volume of pre-trained models to reduce inference costs while maintaining model quality. However, recent advancements have introduced optimization techniques that, interestingly, expand models post-training, increasing model size to improve quality when reducing volume. For instance, to enable 4-bit weight and activation quantization, incoherence processing often necessitates inserting online Hadamard rotations in the compute graph, and preserving highly sensitive weights often calls for additional higher precision computations. However, if application requirements cannot be met, the prevailing solution is to relax quantization constraints. In contrast, we demonstrate post-training model expansion is a viable strategy to improve model quality within a quantization co-design space, and provide theoretical justification. We show it is possible to progressively and selectively expand the size of a pre-trained large language model (LLM) to improve model quality without end-to-end retraining. In particular, when quantizing the weights and activations to 4 bits for Llama3 1B, we reduce the gap to full-precision perplexity by an average of 9% relative to both QuaRot and SpinQuant with only 5% more parameters, which is still a 3.8% reduction in volume relative to a BF16 reference model.

Generalizable AI Model for Indoor Temperature Forecasting Across Sub-Saharan Africa

arXiv:2508.20260v1 Announce Type: new Abstract: This study presents a lightweight, domain-informed AI model for predicting indoor temperatures in naturally ventilated schools and homes in Sub-Saharan Africa. The model extends the Temp-AI-Estimator framework, trained on Tanzanian school data, and evaluated on Nigerian schools and Gambian homes. It achieves robust cross-country performance using only minimal accessible inputs, with mean absolute errors of 1.45{deg}C for Nigerian schools and 0.65{deg}C for Gambian homes. These findings highlight AI's potential for thermal comfort management in resource-constrained environments.

Generative AI Against Poaching: Latent Composite Flow Matching for Wildlife Conservation

arXiv:2508.14342v2 Announce Type: replace Abstract: Poaching poses significant threats to wildlife and biodiversity. A valuable step in reducing poaching is to forecast poacher behavior, which can inform patrol planning and other conservation interventions. Existing poaching prediction methods based on linear models or decision trees lack the expressivity to capture complex, nonlinear spatiotemporal patterns. Recent advances in generative modeling, particularly flow matching, offer a more flexible alternative. However, training such models on real-world poaching data faces two central obstacles: imperfect detection of poaching events and limited data. To address imperfect detection, we integrate flow matching with an occupancy-based detection model and train the flow in latent space to infer the underlying occupancy state. To mitigate data scarcity, we adopt a composite flow initialized from a linear-model prediction rather than random noise which is the standard in diffusion models, injecting prior knowledge and improving generalization. Evaluations on datasets from two national parks in Uganda show consistent gains in predictive accuracy.

A Systematic Review on the Generative AI Applications in Human Medical Genomics

arXiv:2508.20275v1 Announce Type: new Abstract: Although traditional statistical techniques and machine learning methods have contributed significantly to genetics and, in particular, inherited disease diagnosis, they often struggle with complex, high-dimensional data, a challenge now addressed by state-of-the-art deep learning models. Large language models (LLMs), based on transformer architectures, have excelled in tasks requiring contextual comprehension of unstructured medical data. This systematic review examines the role of LLMs in the genetic research and diagnostics of both rare and common diseases. Automated keyword-based search in PubMed, bioRxiv, medRxiv, and arXiv was conducted, targeting studies on LLM applications in diagnostics and education within genetics and removing irrelevant or outdated models. A total of 172 studies were analyzed, highlighting applications in genomic variant identification, annotation, and interpretation, as well as medical imaging advancements through vision transformers. Key findings indicate that while transformer-based models significantly advance disease and risk stratification, variant interpretation, medical imaging analysis, and report generation, major challenges persist in integrating multimodal data (genomic sequences, imaging, and clinical records) into unified and clinically robust pipelines, facing limitations in generalizability and practical implementation in clinical settings. This review provides a comprehensive classification and assessment of the current capabilities and limitations of LLMs in transforming hereditary disease diagnostics and supporting genetic education, serving as a guide to navigate this rapidly evolving field.

FFHFlow: Diverse and Uncertainty-Aware Dexterous Grasp Generation via Flow Variational Inference

arXiv:2407.15161v3 Announce Type: replace-cross Abstract: Synthesizing diverse, uncertainty-aware grasps for multi-fingered hands from partial observations remains a critical challenge in robot learning. Prior generative methods struggle to model the intricate grasp distribution of dexterous hands and often fail to reason about shape uncertainty inherent in partial point clouds, leading to unreliable or overly conservative grasps. We propose FFHFlow, a flow-based variational framework that generates diverse, robust multi-finger grasps while explicitly quantifying perceptual uncertainty in the partial point clouds. Our approach leverages a normalizing flow-based deep latent variable model to learn a hierarchical grasp manifold, overcoming the mode collapse and rigid prior limitations of conditional Variational Autoencoders (cVAEs). By exploiting the invertibility and exact likelihoods of flows, FFHFlow introspects shape uncertainty in partial observations and identifies novel object structures, enabling risk-aware grasp synthesis. To further enhance reliability, we integrate a discriminative grasp evaluator with the flow likelihoods, formulating an uncertainty-aware ranking strategy that prioritizes grasps robust to shape ambiguity. Extensive experiments in simulation and real-world setups demonstrate that FFHFlow outperforms state-of-the-art baselines (including diffusion models) in grasp diversity and success rate, while achieving run-time efficient sampling. We also showcase its practical value in cluttered and confined environments, where diversity-driven sampling excels by mitigating collisions (Project Page: https://sites.google.com/view/ffhflow/home/).

Objective Value Change and Shape-Based Accelerated Optimization for the Neural Network Approximation

arXiv:2508.20290v1 Announce Type: new Abstract: This paper introduce a novel metric of an objective function f, we say VC (value change) to measure the difficulty and approximation affection when conducting an neural network approximation task, and it numerically supports characterizing the local performance and behavior of neural network approximation. Neural networks often suffer from unpredictable local performance, which can hinder their reliability in critical applications. VC addresses this issue by providing a quantifiable measure of local value changes in network behavior, offering insights into the stability and performance for achieving the neural-network approximation. We investigate some fundamental theoretical properties of VC and identified two intriguing phenomena in neural network approximation: the VC-tendency and the minority-tendency. These trends respectively characterize how pointwise errors evolve in relation to the distribution of VC during the approximation process.In addition, we propose a novel metric based on VC, which measures the distance between two functions from the perspective of variation. Building upon this metric, we further propose a new preprocessing framework for neural network approximation. Numerical results including the real-world experiment and the PDE-related scientific problem support our discovery and pre-processing acceleration method.

Quantum-informed machine learning for the prediction of chaotic dynamical systems

arXiv:2507.19861v3 Announce Type: replace-cross Abstract: We introduce a quantum-informed machine learning (QIML) framework for the long-term dynamical behavior of high-dimensional chaotic systems. The method combines a one-time, offline-trained quantum generative model with a classical autoregressive predictor for spatiotemporal field generation. The quantum model learns a quantum prior (Q-Prior) that guides the representation of small-scale interactions and improves the modeling of fine-scale dynamics. We evaluate QIML on three representative systems: the Kuramoto-Sivashinsky equation, the two-dimensional Kolmogorov flow, and a cross-section of fully developed three-dimensional turbulent channel flow used as a realistic inflow condition. Compared to the classical baseline, QIML yields up to 17.25% improvement in predictive distribution accuracy and a 29.36% improvement in the fidelity of the predicted full energy spectrum. For turbulent channel inflow, the Q-Prior is essential: without it, the model fails to evolve in time, while QIML produces stable, physically consistent forecasts that surpass leading machine learning models for PDEs, including the Fourier Neural Operator and Markov Neural Operator, whose errors diverge. Beyond accuracy, QIML also achieves a memory advantage, compressing multi-megabyte datasets into a kilobyte-scale Q-Prior that captures only the invariant measure needed to guide the classical model, thus circumventing Holevo's bound by avoiding full data reconstruction. Our findings provide a practical and scalable pathway for integrating the advantages brought by quantum devices into large-scale scientific, engineering modeling and simulation.

Beacon: Post-Training Quantization with Integrated Grid Selection

arXiv:2508.20293v1 Announce Type: new Abstract: Quantization is a widely used compression technique for reducing the memory and computation costs of large pre-trained models. A key challenge in per-channel post-training quantization (PTQ) is selecting appropriate scaling factors to replace weight values with values from a scaled quantization grid. Existing methods typically fix the scale at the outset via heuristic tuning or grid search. In this note, we propose Beacon, a simple and effective algorithm that eliminates the need for such manual tuning. Beacon performs per-channel PTQ directly using a fixed non-scaled alphabet and automatically determines the optimal scaling factors by exploiting the geometry of symmetric scalar quantization. It supports both symmetric and asymmetric quantization with minimal modifications and does not rely on back-propagation or large calibration sets. Despite its simplicity and tuning-free nature, Beacon achieves competitive performance compared to state-of-the-art methods, making it a practical solution for efficient model deployment.