Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning
arXiv:2505.16950v3 Announce Type: replace Abstract: Transformer LLMs have been shown to exhibit strong reasoning ability that scales with inference-time compute, most prominently through token-space “thinking” chains of thought. A growing line of work pushes extra computation into the model’s latent…
