Archives AI News

Towards Understanding the Robustness of Sparse Autoencoders

arXiv:2604.18756v1 Announce Type: new Abstract: Large Language Models (LLMs) remain vulnerable to optimization-based jailbreak attacks that exploit internal gradient structure. While Sparse Autoencoders (SAEs) are widely used for interpretability, their robustness implications remain underexplored. We present a study of integrating…

Discrete Tilt Matching

arXiv:2604.18739v1 Announce Type: new Abstract: Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive generation. While reinforcement learning (RL) methods have recently been adapted to dLLM fine-tuning, their objectives typically depend on sequence-level marginal likelihoods, which are…

The Cost of Relaxation: Evaluating the Error in Convex Neural Network Verification

arXiv:2604.18728v1 Announce Type: new Abstract: Many neural network (NN) verification systems represent the network’s input-output relation as a constraint program. Sound and complete, representations involve integer constraints, for simulating the activations. Recent works convexly relax the integer constraints, improving performance,…

Streaming Structured Inference with Flash-SemiCRF

arXiv:2604.18780v1 Announce Type: new Abstract: Semi-Markov Conditional Random Fields (semi-CRFs) assign labels to segments of a sequence rather than to individual positions, enabling exact inference over segment-level features and principled uncertainty estimates at their boundaries. However, existing implementations must materialize…