Thin Keys, Full Values: Reducing KV Cache via Low-Dimensional Attention Selection
arXiv:2603.04427v4 Announce Type: replace Abstract: Standard Transformer attention uses identical dimensionality for queries, keys, and values, yet these components serve different roles: queries and keys produce scalar attention weights (selection), while values carry rich representations (value transfer). We show that…
