Can LLM Safety Be Ensured by Constraining Parameter Regions?
arXiv:2602.17696v1 Announce Type: new Abstract: Large language models (LLMs) are often assumed to contain “safety regions” — parameter subsets whose modification directly influences safety behaviors. We conduct a systematic evaluation of four safety region identification methods spanning different parameter granularities,…
