Archives AI News

Benchmark Success, Clinical Failure: When Reinforcement Learning Optimizes for Benchmarks, Not Patients

arXiv:2512.23090v2 Announce Type: replace-cross Abstract: Recent Reinforcement Learning (RL) advances for Large Language Models (LLMs) have improved reasoning tasks, yet their resource-constrained application to medical imaging remains underexplored. We introduce ChexReason, a vision-language model trained via R1-style methodology (SFT followed…

January 5, 2026

Reinforcement Learning with Function Approximation for Non-Markov Processes

arXiv:2601.00151v1 Announce Type: new Abstract: We study reinforcement learning methods with linear function approximation under non-Markov state and cost processes. We first consider the policy evaluation method and show that the algorithm converges under suitable ergodicity conditions on the underlying…

January 5, 2026

Information-Theoretic Quality Metric of Low-Dimensional Embeddings

arXiv:2512.23981v2 Announce Type: replace Abstract: In this work we study the quality of low-dimensional embeddings from an explicitly information-theoretic perspective. We begin by noting that classical evaluation metrics such as stress, rank-based neighborhood criteria, or Local Procrustes quantify distortions in…

January 5, 2026

Dynamic Bayesian Optimization Framework for Instruction Tuning in Partial Differential Equation Discovery

arXiv:2601.00088v1 Announce Type: new Abstract: Large Language Models (LLMs) show promise for equation discovery, yet their outputs are highly sensitive to prompt phrasing, a phenomenon we term instruction brittleness. Static prompts cannot adapt to the evolving state of a multi-step…

January 5, 2026

GRL-SNAM: Geometric Reinforcement Learning with Path Differential Hamiltonians for Simultaneous Navigation and Mapping in Unknown Environments

arXiv:2601.00116v1 Announce Type: new Abstract: We present GRL-SNAM, a geometric reinforcement learning framework for Simultaneous Navigation and Mapping(SNAM) in unknown environments. A SNAM problem is challenging as it needs to design hierarchical or joint policies of multiple agents that control…

January 5, 2026

New research may help scientists predict when a humid heat wave will break

As these events become more common at midlatitudes, a phenomenon called an atmospheric inversion will determine how long they last.

January 5, 2026

The Curse of Depth in Large Language Models

arXiv:2502.05795v3 Announce Type: replace Abstract: In this paper, we introduce the Curse of Depth, a concept that highlights, explains, and addresses the recent observation in modern Large Language Models (LLMs) where nearly half of the layers are less effective than…

January 5, 2026

Flattening Hierarchies with Policy Bootstrapping

arXiv:2505.14975v3 Announce Type: replace Abstract: Offline goal-conditioned reinforcement learning (GCRL) is a promising approach for pretraining generalist policies on large datasets of reward-free trajectories, akin to the self-supervised objectives used to train foundation models for computer vision and natural language…

January 5, 2026

Generative Conditional Missing Imputation Networks

arXiv:2601.00517v1 Announce Type: cross Abstract: In this study, we introduce a sophisticated generative conditional strategy designed to impute missing values within datasets, an area of considerable importance in statistical analysis. Specifically, we initially elucidate the theoretical underpinnings of the Generative…

January 5, 2026

Density-Based Algorithms for Corruption-Robust Contextual Search and Convex Optimization

arXiv:2206.07528v3 Announce Type: replace Abstract: We study the problem of contextual search, a generalization of binary search in higher dimensions, in the adversarial noise model. Let $d$ be the dimension of the problem, $T$ be the time horizon and $C$…

January 5, 2026