Archives AI News

Thoughtbubbles: an Unsupervised Method for Parallel Thinking in Latent Space

arXiv:2510.00219v1 Announce Type: new Abstract: Current approaches for scaling inference-time compute in transformers rely on training them to emit explicit chain-of-thought tokens before producing an answer. While these methods are powerful, they are limited because they cannot be applied during…

The Pitfalls of KV Cache Compression

arXiv:2510.00231v1 Announce Type: new Abstract: KV cache compression promises increased throughput and efficiency with negligible loss in performance. While the gains in throughput are indisputable and recent literature has indeed shown minimal degradation on particular benchmarks, in general the consequences…

UTrace: Poisoning Forensics for Private Collaborative Learning

arXiv:2409.15126v3 Announce Type: replace-cross Abstract: Privacy-preserving machine learning (PPML) systems enable multiple data owners to collaboratively train models without revealing their raw, sensitive data by leveraging cryptographic protocols such as secure multi-party computation (MPC). While PPML offers strong privacy guarantees,…

Resolving UnderEdit & OverEdit with Iterative & Neighbor-Assisted Model Editing

arXiv:2503.11895v3 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are widely deployed in downstream tasks, but keeping their knowledge up-to-date via retraining or fine-tuning is often computationally expensive. Model editing provides a more efficient alternative by updating a targeted subset…

Debunk the Myth of SFT Generalization

arXiv:2510.00237v1 Announce Type: new Abstract: A prevailing view holds that supervised fine-tuning (SFT) memorizes training data and fails to generalize, whereas reinforcement learning (RL) attains broader robustness. We revisit this claim through a systematic evaluation on two decision-making benchmarks, Sokoban…

Learning Inter-Atomic Potentials without Explicit Equivariance

arXiv:2510.00027v1 Announce Type: new Abstract: Accurate and scalable machine-learned inter-atomic potentials (MLIPs) are essential for molecular simulations ranging from drug discovery to new material design. Current state-of-the-art models enforce roto-translational symmetries through equivariant neural network architectures, a hard-wired inductive bias…