Archives AI News

HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents

arXiv:2603.00977v2 Announce Type: replace-cross Abstract: Large language model (LLM) agents have recently demonstrated strong capabilities in interactive decision-making, yet they remain fundamentally limited in long-horizon tasks that require structured planning and reliable execution. Existing approaches predominantly rely on flat autoregressive…

May 6, 2026

Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis

arXiv:2605.03441v1 Announce Type: cross Abstract: Large language models (LLMs) employ safety mechanisms to prevent harmful outputs, yet these defenses primarily rely on semantic pattern matching. We show that encoding harmful prompts as coherent mathematical problems — using formalisms such as…

May 6, 2026

When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models

arXiv:2605.02914v1 Announce Type: new Abstract: A guard model fine-tuned on entirely benign data can lose all safety alignment — not through adversarial manipulation, but through standard domain specialization. We demonstrate this failure across three purpose-built safety classifiers — LlamaGuard, WildGuard,…

May 6, 2026

From Synthesis to Clinical Assistance: A Strategy-Aware Agent Framework for Autism Intervention based on Real Clinical Dataset

arXiv:2605.02916v1 Announce Type: new Abstract: The development of AI-assisted Early Intensive Behavioral Intervention (EIBI) for Autism Spectrum Disorder (ASD) is severely constrained by data scarcity. Furthermore, while Applied Behavior Analysis (ABA) serves as the gold standard for clinical intervention, general-purpose…

May 6, 2026

Agentic AI-Based Joint Computing and Networking via Mixture of Experts and Large Language Models

arXiv:2605.02911v1 Announce Type: new Abstract: Future sixth-generation (6G) mobile networks are envisioned to be equipped with a diverse set of powerful, yet highly specialized, optimization experts. Such a promising vision is concurrently expected to give rise to the need for…

May 6, 2026

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

arXiv:2605.02913v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a central post-training tool for improving the reasoning abilities of large language models (LLMs). In these systems, the rollout, the trajectory sampled from a prompt to termination, including intermediate reasoning…

May 6, 2026

Delay, Plateau, or Collapse: Evaluating the Impact of Systematic Verification Error on RLVR

arXiv:2605.02909v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become a powerful approach for improving the reasoning capabilities of large language models (LLMs). While RLVR is designed for tasks with verifiable ground-truth answers, real-world verifiers (e.g., static…

May 6, 2026

On the Invariants of Softmax Attention

arXiv:2605.02907v1 Announce Type: new Abstract: Softmax attention maps every query–key interaction into a probability distribution, but the underlying structure remains largely unexplored. We define the emph{energy field}, the row-centered attention logit, and show that it exhibits invariant properties across models,…

May 6, 2026

An End-to-End Framework for Building Large Language Models for Software Operations

arXiv:2605.02906v1 Announce Type: new Abstract: In the field of software operations, Large Language Models (LLMs) have attracted increasing attention. However, existing research has not yet achieved efficient and effective end-to-end intelligent operations due to low-quality data, fragmented knowledge and insufficient…

May 6, 2026

psifx — Psychological and Social Interactions Feature Extraction Package

arXiv:2407.10266v5 Announce Type: replace-cross Abstract: psifx is a plug-and-play multi-modal feature extraction toolkit, aiming to facilitate and democratize the use of state-of-the-art machine learning techniques for human sciences research. It is motivated by a need (a) to automate and standardize…

May 6, 2026