BiasJailbreak:Analyzing Ethical Biases and Jailbreak Vulnerabilities in Large Language Models
arXiv:2410.13334v5 Announce Type: replace-cross Abstract: Although large language models (LLMs) demonstrate impressive proficiency in various tasks, they present potential safety risks, such as `jailbreaks’, where malicious inputs can coerce LLMs into generating harmful content bypassing safety alignments. In this paper,…
