AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization
arXiv:2604.18137v1 Announce Type: cross Abstract: Processing-in-Memory (PIM) architectures offer a promising solution to the memory bottlenecks in data-intensive machine learning, yet often overlook the growing challenge of activation memory footprint. Conventional PIM approaches struggle with massive KV cache sizes generated…
