Archives AI News

From Static to Dynamic: Enhancing Offline-to-Online Reinforcement Learning via Energy-Guided Diffusion Stratification

arXiv:2511.03828v1 Announce Type: new Abstract: Transitioning from offline to online reinforcement learning (RL) poses critical challenges due to distributional shifts between the fixed behavior policy in the offline dataset and the evolving policy during online learning. Although this issue is…

November 7, 2025

How Memory in Optimization Algorithms Implicitly Modifies the Loss

arXiv:2502.02132v2 Announce Type: replace Abstract: In modern optimization methods used in deep learning, each update depends on the history of previous iterations, often referred to as memory, and this dependence decays fast as the iterates go further into the past.…

November 7, 2025

Higher-Order Causal Structure Learning with Additive Models

arXiv:2511.03831v1 Announce Type: new Abstract: Causal structure learning has long been the central task of inferring causal insights from data. Despite the abundance of real-world processes exhibiting higher-order mechanisms, however, an explicit treatment of interactions in causal discovery has received…

November 7, 2025

Explicit Density Approximation for Neural Implicit Samplers Using a Bernstein-Based Convex Divergence

arXiv:2506.04700v2 Announce Type: replace Abstract: Rank-based statistical metrics, such as the invariant statistical loss (ISL), have recently emerged as robust and practically effective tools for training implicit generative models. In this work, we introduce dual-ISL, a novel likelihood-free objective for…

November 7, 2025

Enhancing Q-Value Updates in Deep Q-Learning via Successor-State Prediction

arXiv:2511.03836v1 Announce Type: new Abstract: Deep Q-Networks (DQNs) estimate future returns by learning from transitions sampled from a replay buffer. However, the target updates in DQN often rely on next states generated by actions from past, potentially suboptimal, policy. As…

November 7, 2025

Rewarding the Journey, Not Just the Destination: A Composite Path and Answer Self-Scoring Reward Mechanism for Test-Time Reinforcement Learning

arXiv:2510.17923v2 Announce Type: replace Abstract: Reinforcement Learning (RL) has emerged as a powerful paradigm for advancing Large Language Models (LLMs), achieving remarkable performance in complex reasoning domains such as mathematics and code generation. However, current RL methods face a fundamental…

November 7, 2025

Fraud-Proof Revenue Division on Subscription Platforms

arXiv:2511.04465v1 Announce Type: cross Abstract: We study a model of subscription-based platforms where users pay a fixed fee for unlimited access to content, and creators receive a share of the revenue. Existing approaches to detecting fraud predominantly rely on machine…

November 7, 2025

Efficient Model Development through Fine-tuning Transfer

arXiv:2503.20110v2 Announce Type: replace-cross Abstract: Modern LLMs struggle with efficient updates, as each new pretrained model version requires repeating expensive alignment processes. This challenge also applies to domain- or languagespecific models, where fine-tuning on specialized data must be redone for…

November 7, 2025

FLOWR.root: A flow matching based foundation model for joint multi-purpose structure-aware 3D ligand generation and affinity prediction

arXiv:2510.02578v3 Announce Type: replace-cross Abstract: We present FLOWR:root, an equivariant flow-matching model for pocket-aware 3D ligand generation with joint binding affinity prediction and confidence estimation. The model supports de novo generation, pharmacophore-conditional sampling, fragment elaboration, and multi-endpoint affinity prediction (pIC50,…

November 7, 2025

Test-Time Warmup for Multimodal Large Language Models

arXiv:2509.10641v2 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) hold great promise for advanced reasoning at the intersection of text and images, yet they have not fully realized this potential. MLLMs typically integrate an LLM, a vision encoder, and…

November 7, 2025