The Initialization Determines Whether In-Context Learning Is Gradient Descent
arXiv:2512.04268v1 Announce Type: new Abstract: In-context learning (ICL) in large language models (LLMs) is a striking phenomenon, yet its underlying mechanisms remain only partially understood. Previous work connects linear self-attention (LSA) to gradient descent (GD), this connection has primarily been…
