Archives AI News

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

arXiv:2509.22638v1 Announce Type: cross Abstract: LLMs are often trained with RL from human or AI feedback, yet such methods typically compress nuanced feedback into scalar rewards, discarding much of their richness and inducing scale imbalance. We propose treating verbal feedback…

September 29, 2025

Uncertainty-Aware Knowledge Tracing Models

arXiv:2509.21514v1 Announce Type: new Abstract: The main focus of research on Knowledge Tracing (KT) models is on model developments with the aim of improving predictive accuracy. Most of these models make the most incorrect predictions when students choose a distractor,…

September 29, 2025

Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models

arXiv:2410.08074v3 Announce Type: replace Abstract: Text-to-image diffusion models rely on massive, web-scale datasets. Training them from scratch is computationally expensive, and as a result, developers often prefer to make incremental updates to existing models. These updates often compose fine-tuning steps…

September 29, 2025

$mathbf{Li_2}$: A Framework on Dynamics of Feature Emergence and Delayed Generalization

arXiv:2509.21519v1 Announce Type: new Abstract: While the phenomenon of grokking, i.e., delayed generalization, has been studied extensively, it remains an open question whether there is a mathematical framework to characterize what kind of features emerge, how and in which conditions…

September 29, 2025

Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models

arXiv:2502.19649v4 Announce Type: replace Abstract: Representation Engineering (RepE) is a novel paradigm for controlling the behavior of LLMs. Unlike traditional approaches that modify inputs or fine-tune the model, RepE directly manipulates the model’s internal representations. As a result, it may…

September 29, 2025

TRiCo: Triadic Game-Theoretic Co-Training for Robust Semi-Supervised Learning

arXiv:2509.21526v1 Announce Type: new Abstract: We introduce TRiCo, a novel triadic game-theoretic co-training framework that rethinks the structure of semi-supervised learning by incorporating a teacher, two students, and an adversarial generator into a unified training paradigm. Unlike existing co-training or…

September 29, 2025

Preemptive Detection and Steering of LLM Misalignment via Latent Reachability

arXiv:2509.21528v1 Announce Type: new Abstract: Large language models (LLMs) are now ubiquitous in everyday tools, raising urgent safety concerns about their tendency to generate harmful content. The dominant safety approach — reinforcement learning from human feedback (RLHF) — effectively shapes…

September 29, 2025

Learnable Kernel Density Estimation for Graphs

arXiv:2505.21285v3 Announce Type: replace Abstract: This work proposes a framework LGKDE that learns kernel density estimation for graphs. The key challenge in graph density estimation lies in effectively capturing both structural patterns and semantic variations while maintaining theoretical guarantees. Combining…

September 29, 2025

Expert-guided Clinical Text Augmentation via Query-Based Model Collaboration

arXiv:2509.21530v1 Announce Type: new Abstract: Data augmentation is a widely used strategy to improve model robustness and generalization by enriching training datasets with synthetic examples. While large language models (LLMs) have demonstrated strong generative capabilities for this purpose, their applications…

September 29, 2025

RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?

arXiv:2506.14261v3 Announce Type: replace Abstract: Latent-space monitors aim to detect undesirable behaviours in Large Language Models by leveraging their internal representations rather than relying solely on black-box outputs. These methods have shown promise in identifying behaviours such as deception and…

September 29, 2025