Archives AI News

Beyond Prediction: Reinforcement Learning as the Defining Leap in Healthcare AI

arXiv:2508.21101v1 Announce Type: new Abstract: Reinforcement learning (RL) marks a fundamental shift in how artificial intelligence is applied in healthcare. Instead of merely predicting outcomes, RL actively decides interventions with long term goals. Unlike traditional models that operate on fixed associations, RL systems learn through trial, feedback, and long-term reward optimization, introducing transformative possibilities and new risks. From an information fusion lens, healthcare RL typically integrates multi-source signals such as vitals, labs clinical notes, imaging and device telemetry using temporal and decision-level mechanisms. These systems can operate within centralized, federated, or edge architectures to meet real-time clinical constraints, and naturally span data, features and decision fusion levels. This survey explore RL's rise in healthcare as more than a set of tools, rather a shift toward agentive intelligence in clinical environments. We first structure the landscape of RL techniques including model-based and model-free methods, offline and batch-constrained approaches, and emerging strategies for reward specification and uncertainty calibration through the lens of healthcare constraints. We then comprehensively analyze RL applications spanning critical care, chronic disease, mental health, diagnostics, and robotic assistance, identifying their trends, gaps, and translational bottlenecks. In contrast to prior reviews, we critically analyze RL's ethical, deployment, and reward design challenges, and synthesize lessons for safe, human-aligned policy learning. This paper serves as both a a technical roadmap and a critical reflection of RL's emerging transformative role in healthcare AI not as prediction machinery, but as agentive clinical intelligence.

Adaptive LLM Routing under Budget Constraints

arXiv:2508.21141v1 Announce Type: new Abstract: Large Language Models (LLMs) have revolutionized natural language processing, but their varying capabilities and costs pose challenges in practical applications. LLM routing addresses this by dynamically selecting the most suitable LLM for each query/task. Previous approaches treat this as a supervised learning problem, assuming complete knowledge of optimal query-LLM pairings. However, real-world scenarios lack such comprehensive mappings and face evolving user queries. We thus propose to study LLM routing as a contextual bandit problem, enabling adaptive decision-making using bandit feedback without requiring exhaustive inference across all LLMs for all queries (in contrast to supervised routing). To address this problem, we develop a shared embedding space for queries and LLMs, where query and LLM embeddings are aligned to reflect their affinity. This space is initially learned from offline human preference data and refined through online bandit feedback. We instantiate this idea through Preference-prior Informed Linucb fOr adaptive rouTing (PILOT), a novel extension of LinUCB. To handle diverse user budgets for model routing, we introduce an online cost policy modeled as a multi-choice knapsack problem, ensuring resource-efficient routing.

Privacy Auditing Synthetic Data Release through Local Likelihood Attacks

arXiv:2508.21146v1 Announce Type: new Abstract: Auditing the privacy leakage of synthetic data is an important but unresolved problem. Most existing privacy auditing frameworks for synthetic data rely on heuristics and unreasonable assumptions to attack the failure modes of generative models, exhibiting limited capability to describe and detect the privacy exposure of training data through synthetic data release. In this paper, we study designing Membership Inference Attacks (MIAs) that specifically exploit the observation that tabular generative models tend to significantly overfit to certain regions of the training distribution. Here, we propose Generative Likelihood Ratio Attack (Gen-LRA), a novel, computationally efficient No-Box MIA that, with no assumption of model knowledge or access, formulates its attack by evaluating the influence a test observation has in a surrogate model's estimation of a local likelihood ratio over the synthetic data. Assessed over a comprehensive benchmark spanning diverse datasets, model architectures, and attack parameters, we find that Gen-LRA consistently dominates other MIAs for generative models across multiple performance metrics. These results underscore Gen-LRA's effectiveness as a privacy auditing tool for the release of synthetic data, highlighting the significant privacy risks posed by generative model overfitting in real-world applications.

An Explainable, Attention-Enhanced, Bidirectional Long Short-Term Memory Neural Network for Joint 48-Hour Forecasting of Temperature, Irradiance, and Relative Humidity

arXiv:2508.21109v1 Announce Type: new Abstract: This paper presents a Deep Learning (DL) framework for 48-hour forecasting of temperature, solar irradiance, and relative humidity to support Model Predictive Control (MPC) in smart HVAC systems. The approach employs a stacked Bidirectional Long Short-Term Memory (BiLSTM) network with attention, capturing temporal and cross-feature dependencies by jointly predicting all three variables. Historical meteorological data (2019-2022) with encoded cyclical time features were used for training, while 2023 data evaluated generalization. The model achieved Mean Absolute Errors of 1.3 degrees Celsius (temperature), 31 W/m2 (irradiance), and 6.7 percentage points (humidity), outperforming state-of-the-art numerical weather prediction and machine learning benchmarks. Integrated Gradients quantified feature contributions, and attention weights revealed temporal patterns, enhancing interpretability. By combining multivariate forecasting, attention-based DL, and explainability, this work advances data-driven weather prediction. The demonstrated accuracy and transparency highlight the framework's potential for energy-efficient building control through reliable short-term meteorological forecasting.

Automating the Deep Space Network Data Systems; A Case Study in Adaptive Anomaly Detection through Agentic AI

arXiv:2508.21111v1 Announce Type: new Abstract: The Deep Space Network (DSN) is NASA's largest network of antenna facilities that generate a large volume of multivariate time-series data. These facilities contain DSN antennas and transmitters that undergo degradation over long periods of time, which may cause costly disruptions to the data flow and threaten the earth-connection of dozens of spacecraft that rely on the Deep Space Network for their lifeline. The purpose of this study was to experiment with different methods that would be able to assist JPL engineers with directly pinpointing anomalies and equipment degradation through collected data, and continue conducting maintenance and operations of the DSN for future space missions around our universe. As such, we have researched various machine learning techniques that can fully reconstruct data through predictive analysis, and determine anomalous data entries within real-time datasets through statistical computations and thresholds. On top of the fully trained and tested machine learning models, we have also integrated the use of a reinforcement learning subsystem that classifies identified anomalies based on severity level and a Large Language Model that labels an explanation for each anomalous data entry, all of which can be improved and fine-tuned over time through human feedback/input. Specifically, for the DSN transmitters, we have also implemented a full data pipeline system that connects the data extraction, parsing, and processing workflow all together as there was no coherent program or script for performing these tasks before. Using this data pipeline system, we were able to then also connect the models trained from DSN antenna data, completing the data workflow for DSN anomaly detection. This was all wrapped around and further connected by an agentic AI system, where complex reasoning was utilized to determine the classifications and predictions of anomalous data.

Dynamic Low-rank Approximation of Full-Matrix Preconditioner for Training Generalized Linear Models

arXiv:2508.21106v1 Announce Type: new Abstract: Adaptive gradient methods like Adagrad and its variants are widespread in large-scale optimization. However, their use of diagonal preconditioning matrices limits the ability to capture parameter correlations. Full-matrix adaptive methods, approximating the exact Hessian, can model these correlations and may enable faster convergence. At the same time, their computational and memory costs are often prohibitive for large-scale models. To address this limitation, we propose AdaGram, an optimizer that enables efficient full-matrix adaptive gradient updates. To reduce memory and computational overhead, we utilize fast symmetric factorization for computing the preconditioned update direction at each iteration. Additionally, we maintain the low-rank structure of a preconditioner along the optimization trajectory using matrix integrator methods. Numerical experiments on standard machine learning tasks show that AdaGram converges faster or matches the performance of diagonal adaptive optimizers when using rank five and smaller rank approximations. This demonstrates AdaGram's potential as a scalable solution for adaptive optimization in large models.

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

arXiv:2508.21104v1 Announce Type: new Abstract: Critic-free reinforcement learning methods, particularly group policies, have attracted considerable attention for their efficiency in complex tasks. However, these methods rely heavily on multiple sampling and comparisons within the policy to estimate advantage, which may cause the policy to fall into local optimum and increase computational cost. To address these issues, we propose PVPO, an efficient reinforcement learning method enhanced by an advantage reference anchor and data pre-sampling. Specifically, we use the reference model to rollout in advance and employ the calculated reward score as a reference anchor. Our approach effectively corrects the cumulative bias introduced by intra-group comparisons and significantly reduces reliance on the number of rollouts. Meanwhile, the reference model can assess sample difficulty during data pre-sampling, enabling effective selection of high-gain data to improve training efficiency. Experiments conducted on nine datasets across two domains demonstrate that PVPO achieves State-Of-The-Art (SOTA) performance. Our approach not only demonstrates robust generalization across multiple tasks, but also exhibits scalable performance across models of varying scales.

Spatiotemporal EEG-Based Emotion Recognition Using SAM Ratings from Serious Games with Hybrid Deep Learning

arXiv:2508.21103v1 Announce Type: new Abstract: Recent advancements in EEG-based emotion recognition have shown promising outcomes using both deep learning and classical machine learning approaches; however, most existing studies focus narrowly on binary valence prediction or subject-specific classification, which limits generalizability and deployment in real-world affective computing systems. To address this gap, this paper presents a unified, multigranularity EEG emotion classification framework built on the GAMEEMO dataset, which consists of 14-channel EEG recordings and continuous self-reported emotion ratings (boring, horrible, calm, and funny) from 28 subjects across four emotion-inducing gameplay scenarios. Our pipeline employs a structured preprocessing strategy that comprises temporal window segmentation, hybrid statistical and frequency-domain feature extraction, and z-score normalization to convert raw EEG signals into robust, discriminative input vectors. Emotion labels are derived and encoded across three complementary axes: (i) binary valence classification based on the averaged polarity of positive and negative emotion ratings, and (ii) Multi-class emotion classification, where the presence of the most affective state is predicted. (iii) Fine-grained multi-label representation via binning each emotion into 10 ordinal classes. We evaluate a broad spectrum of models, including Random Forest, XGBoost, and SVM, alongside deep neural architectures such as LSTM, LSTM-GRU, and CNN-LSTM. Among these, the LSTM-GRU model consistently outperforms the others, achieving an F1-score of 0.932 in the binary valence task and 94.5% and 90.6% in both multi-class and Multi-Label emotion classification.

C-Flat++: Towards a More Efficient and Powerful Framework for Continual Learning

arXiv:2508.18860v2 Announce Type: replace Abstract: Balancing sensitivity to new tasks and stability for retaining past knowledge is crucial in continual learning (CL). Recently, sharpness-aware minimization has proven effective in transfer learning and has also been adopted in continual learning (CL) to improve memory retention and learning efficiency. However, relying on zeroth-order sharpness alone may favor sharper minima over flatter ones in certain settings, leading to less robust and potentially suboptimal solutions. In this paper, we propose textbf{C}ontinual textbf{Flat}ness (textbf{C-Flat}), a method that promotes flatter loss landscapes tailored for CL. C-Flat offers plug-and-play compatibility, enabling easy integration with minimal modifications to the code pipeline. Besides, we present a general framework that integrates C-Flat into all major CL paradigms and conduct comprehensive comparisons with loss-minima optimizers and flat-minima-based CL methods. Our results show that C-Flat consistently improves performance across a wide range of settings. In addition, we introduce C-Flat++, an efficient yet effective framework that leverages selective flatness-driven promotion, significantly reducing the update cost required by C-Flat. Extensive experiments across multiple CL methods, datasets, and scenarios demonstrate the effectiveness and efficiency of our proposed approaches. Code is available at https://github.com/WanNaa/C-Flat.