Archives AI News

Value Mirror Descent for Reinforcement Learning

arXiv:2604.06039v1 Announce Type: cross Abstract: Value iteration-type methods have been extensively studied for computing a nearly optimal value function in reinforcement learning (RL). Under a generative sampling model, these methods can achieve sharper sample complexity than policy optimization approaches, particularly…

Automatic Replication of LLM Mistakes in Medical Conversations

arXiv:2512.20983v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly evaluated in clinical settings using multi-dimensional rubrics which quantify reasoning quality, safety, and patient-centeredness. Yet, replicating specific mistakes in other LLM models is not straightforward and often requires manual…

Quantization-Robust LLM Unlearning via Low-Rank Adaptation

arXiv:2602.13151v3 Announce Type: replace Abstract: Large Language Model (LLM) unlearning aims to remove targeted knowledge from a trained model, but practical deployments often require post-training quantization (PTQ) for efficient inference. However, aggressive low-bit PTQ can mask unlearning updates, causing quantized…