Archives AI News

AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin

arXiv:2506.08473v3 Announce Type: replace Abstract: Fine-tuning large language models (LLMs) improves performance but introduces critical safety vulnerabilities: even minimal harmful data can severely compromise safety measures. We observe that perturbations orthogonal to the alignment direction – defined by weight differences…

VeRPO: Verifiable Dense Reward Policy Optimization for Code Generation

arXiv:2601.03525v1 Announce Type: new Abstract: Effective reward design is a central challenge in Reinforcement Learning (RL) for code generation. Mainstream pass/fail outcome rewards enforce functional correctness via executing unit tests, but the resulting sparsity limits potential performance gains. While recent…