Archives AI News

MoE-Spec: Expert Budgeting for Efficient Speculative Decoding

arXiv:2602.16052v1 Announce Type: new Abstract: Speculative decoding accelerates Large Language Model (LLM) inference by verifying multiple drafted tokens in parallel. However, for Mixture-of-Experts (MoE) models, this parallelism introduces a severe bottleneck: large draft trees activate many unique experts, significantly increasing…

Multi-Objective Alignment of Language Models for Personalized Psychotherapy

arXiv:2602.16053v1 Announce Type: new Abstract: Mental health disorders affect over 1 billion people worldwide, yet access to care remains limited by workforce shortages and cost constraints. While AI systems show therapeutic promise, current alignment approaches optimize objectives independently, failing to…

Variable-Length Semantic IDs for Recommender Systems

arXiv:2602.16375v1 Announce Type: cross Abstract: Generative models are increasingly used in recommender systems, both for modeling user behavior as event sequences and for integrating large language models into recommendation pipelines. A key challenge in this setting is the extremely large…