Archives AI News

Ratio-Variance Regularized Policy Optimization for Efficient LLM Fine-tuning

arXiv:2601.03320v1 Announce Type: new Abstract: On-policy reinforcement learning (RL), particularly Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), has become the dominant paradigm for fine-tuning large language models (LLMs). While policy ratio clipping stabilizes training, this heuristic hard…

Low Resource Reconstruction Attacks Through Benign Prompts

arXiv:2507.07947v3 Announce Type: replace Abstract: Recent advances in generative models, such as diffusion models, have raised concerns related to privacy, copyright infringement, and data stewardship. To better understand and control these risks, prior work has introduced techniques and attacks that…