Archives AI News

A Comparative Theoretical Analysis of Entropy Control Methods in Reinforcement Learning

arXiv:2604.09676v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a key approach for enhancing reasoning in large language models (LLMs), yet scalable training is often hindered by the rapid collapse of policy entropy, which leads to premature convergence and…

April 14, 2026

Belief-State RWKV for Reinforcement Learning under Partial Observability

arXiv:2604.09671v1 Announce Type: new Abstract: We propose a stronger formulation of RL on top of RWKV-style recurrent sequence models, in which the fixed-size recurrent state is explicitly interpreted as a belief state rather than an opaque hidden vector. Instead of…

April 14, 2026

Human-like Working Memory Interference in Large Language Models

arXiv:2604.09670v1 Announce Type: new Abstract: Intelligent systems must maintain and manipulate task-relevant information online to adapt to dynamic environments and changing goals. This capacity, known as working memory, is fundamental to human reasoning and intelligence. Despite having on the order…

April 14, 2026

Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base model

arXiv:2604.09665v1 Announce Type: new Abstract: While the wide adoption of refusal training in large language models (LLMs) has showcased improvements in model safety, recent works have highlighted shortcomings due to the shallow nature of these alignment methods. To this end,…

April 14, 2026

FORGE: Fine-grained Multimodal Evaluation for Manufacturing Scenarios

arXiv:2604.07413v2 Announce Type: replace-cross Abstract: The manufacturing sector is increasingly adopting Multimodal Large Language Models (MLLMs) to transition from simple perception to autonomous execution, yet current evaluations fail to reflect the rigorous demands of real-world manufacturing environments. Progress is hindered…

April 14, 2026

ExecTune: Effective Steering of Black-Box LLMs with Guide Models

arXiv:2604.09741v1 Announce Type: new Abstract: For large language models deployed through black-box APIs, recurring inference costs often exceed one-time training costs. This motivates composed agentic systems that amortize expensive reasoning into reusable intermediate representations. We study a broad class of…

April 14, 2026

Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization

arXiv:2604.11510v1 Announce Type: cross Abstract: To encourage diverse exploration in reinforcement learning (RL) for large language models (LLMs) without compromising accuracy, we propose Policy Split, a novel paradigm that bifurcates the policy into normal and high-entropy modes with a high-entropy…

April 14, 2026

Efficient Matrix Implementation for Rotary Position Embedding

arXiv:2604.09742v1 Announce Type: new Abstract: Rotary Position Embedding (RoPE) has become a core component of modern Transformer architectures across language, vision, and 3D domains. However, existing implementations rely on vector-level split and merge operations that introduce non-negligible computational overhead, often…

April 14, 2026

The Phantom of PCIe: Constraining Generative Artificial Intelligences for Practical Peripherals Trace Synthesizing

arXiv:2411.06376v3 Announce Type: replace Abstract: Peripheral Component Interconnect Express (PCIe) is the de facto interconnect standard for high-speed peripherals and CPUs. The development of PCIe devices for emerging applications requires realistic Transaction Layer Packet (TLP) traces that accurately simulate device-CPU…

April 14, 2026

Explainable Human Activity Recognition: A Unified Review of Concepts and Mechanisms

arXiv:2604.09799v1 Announce Type: new Abstract: Human activity recognition (HAR) has become a key component of intelligent systems for healthcare monitoring, assistive living, smart environments, and human-computer interaction. Although deep learning has substantially improved HAR performance on multivariate sensor data, the…

April 14, 2026