Archives AI News

On the Occurence of Critical Learning Periods in Neural Networks

arXiv:2510.09687v1 Announce Type: new Abstract: This study delves into the plasticity of neural networks, offering empirical support for the notion that critical learning periods and warm-starting performance loss can be avoided through simple adjustments to learning hyperparameters. The critical learning…

Evaluation of Differential Privacy Mechanisms on Federated Learning

arXiv:2510.09691v1 Announce Type: new Abstract: Federated learning is distributed model training across several clients without disclosing raw data. Despite advancements in data privacy, risks still remain. Differential Privacy (DP) is a technique to protect sensitive data by adding noise to…

Noise Injection Systemically Degrades Large Language Model Safety Guardrails

arXiv:2505.13500v2 Announce Type: replace-cross Abstract: Safety guardrails in large language models (LLMs) are a critical component in preventing harmful outputs. Yet, their resilience under perturbation remains poorly understood. In this paper, we investigate the robustness of safety fine-tuning in LLMs…

FSA: An Alternative Efficient Implementation of Native Sparse Attention Kernel

arXiv:2508.18224v2 Announce Type: replace-cross Abstract: Recent advance in sparse attention mechanisms has demonstrated strong potential for reducing the computational cost of long-context training and inference in large language models (LLMs). Native Sparse Attention (NSA), one state-of-the-art approach, introduces natively trainable,…

Discursive Circuits: How Do Language Models Understand Discourse Relations?

arXiv:2510.11210v1 Announce Type: cross Abstract: Which components in transformer language models are responsible for discourse understanding? We hypothesize that sparse computational graphs, termed as discursive circuits, control how models process discourse relations. Unlike simpler tasks, discourse relations involve longer spans…