Archives AI News

Value Flows

arXiv:2510.07650v2 Announce Type: replace Abstract: While most reinforcement learning methods today flatten the distribution of future returns to a single scalar value, distributional RL methods exploit the return distribution to provide stronger learning signals and to enable applications in exploration…

SEval-NAS: A Search-Agnostic Evaluation for Neural Architecture Search

arXiv:2603.00099v1 Announce Type: new Abstract: Neural architecture search (NAS) automates the discovery of neural networks that meet specified criteria, yet its evaluation procedures are often hardcoded, limiting the ability to introduce new metrics. This issue is especially pronounced in hardware-aware…

Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs

arXiv:2512.03324v2 Announce Type: replace Abstract: Memory and computation remain core bottlenecks in long-horizon LLM inference due to the quadratic cost of self-attention and the ever-growing key-value (KV) cache. Existing strategies for memory-bounded inference, such as quantization, offloading, or heuristic KV…

Wideband Power Amplifier Behavioral Modeling Using an Amplitude Conditioned LSTM

arXiv:2603.00101v1 Announce Type: new Abstract: Wideband power amplifiers exhibit complex nonlinear and memory effects that challenge traditional behavioral modeling approaches. This paper proposes a novel amplitude conditioned long short-term memory (AC-LSTM) network that introduces explicit amplitude-dependent gating to enhance the…

LIDS: LLM Summary Inference Under the Layered Lens

arXiv:2603.00105v1 Announce Type: new Abstract: Large language models (LLMs) have gained significant attention by many researchers and practitioners in natural language processing (NLP) since the introduction of ChatGPT in 2022. One notable feature of ChatGPT is its ability to generate…

NNiT: Width-Agnostic Neural Network Generation with Structurally Aligned Weight Spaces

arXiv:2603.00180v1 Announce Type: new Abstract: Generative modeling of neural network parameters is often tied to architectures because standard parameter representations rely on known weight-matrix dimensions. Generation is further complicated by permutation symmetries that allow networks to model similar input-output functions…