Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective
arXiv:2509.22921v1 Announce Type: new Abstract: We introduce a novel approach to large language model (LLM) distillation by formulating it as a constrained reinforcement learning problem. While recent work has begun exploring the integration of task-specific rewards into distillation processes, existing…
