Archives AI News

Procedural Fairness via Group Counterfactual Explanation

arXiv:2603.11140v1 Announce Type: new Abstract: Fairness in machine learning research has largely focused on outcome-oriented fairness criteria such as Equalized Odds, while comparatively less attention has been given to procedural-oriented fairness, which addresses how a model arrives at its predictions.…

Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models

arXiv:2603.11149v1 Announce Type: new Abstract: Large language models remain vulnerable to jailbreak attacks, yet we still lack a systematic understanding of how jailbreak success scales with attacker effort across methods, model families, and harm types. We initiate a scaling-law framework…

Can AI Agents Agree?

arXiv:2603.01213v2 Announce Type: replace-cross Abstract: Large language models are increasingly deployed as cooperating agents, yet their behavior in adversarial consensus settings has not been systematically studied. We evaluate LLM-based agents on a Byzantine consensus game over scalar values using a…