Language Models Can Learn from Verbal Feedback Without Scalar Rewards
arXiv:2509.22638v1 Announce Type: cross Abstract: LLMs are often trained with RL from human or AI feedback, yet such methods typically compress nuanced feedback into scalar rewards, discarding much of their richness and inducing scale imbalance. We propose treating verbal feedback…
