Archives AI News

Not all tokens are needed(NAT): token efficient reinforcement learning

arXiv:2603.06619v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a key driver of progress in large language models, but scaling RL to long chain-of-thought (CoT) trajectories is increasingly constrained by backpropagation over every generated token. Even with optimized rollout…

March 10, 2026

Stronger Enforcement of Instruction Hierarchy via Augmented Intermediate Representations

arXiv:2505.18907v2 Announce Type: replace-cross Abstract: Prompt injection attacks are a critical security vulnerability in large language models (LLMs), allowing attackers to hijack model behavior by injecting malicious instructions within the input context. Recent defense mechanisms have leveraged an Instruction Hierarchy…

March 10, 2026

Reward Under Attack: Analyzing the Robustness and Hackability of Process Reward Models

arXiv:2603.06621v1 Announce Type: new Abstract: Process Reward Models (PRMs) are rapidly becoming the backbone of LLM reasoning pipelines, yet we demonstrate that state-of-the-art PRMs are systematically exploitable under adversarial optimization pressure. To address this, we introduce a three-tiered diagnostic framework…

March 10, 2026

Crowdsourcing the Frontier: Advancing Hybrid Physics-ML Climate Simulation via a $50,000 Kaggle Competition

arXiv:2511.20963v4 Announce Type: replace-cross Abstract: Subgrid machine-learning (ML) parameterizations have the potential to introduce a new generation of climate models that incorporate the effects of higher-resolution physics without incurring the prohibitive computational cost associated with more explicit physics-based simulations. However,…

March 10, 2026

From ARIMA to Attention: Power Load Forecasting Using Temporal Deep Learning

arXiv:2603.06622v1 Announce Type: new Abstract: Accurate short-term power load forecasting is important to effectively manage, optimize, and ensure the robustness of modern power systems. This paper performs an empirical evaluation of a traditional statistical model and deep learning approaches for…

March 10, 2026

MEM: Multi-Scale Embodied Memory for Vision Language Action Models

arXiv:2603.03596v2 Announce Type: replace-cross Abstract: Conventionally, memory in end-to-end robotic learning involves inputting a sequence of past observations into the learned policy. However, in complex multi-stage real-world tasks, the robot’s memory must represent past events at multiple levels of granularity:…

March 10, 2026

Advances in GRPO for Generation Models: A Survey

arXiv:2603.06623v1 Announce Type: new Abstract: Large-scale flow matching models have achieved strong performance across generative tasks such as text-to-image, video, 3D, and speech synthesis. However, aligning their outputs with human preferences and task-specific objectives remains challenging. Flow-GRPO extends Group Relative…

March 10, 2026

Viewpoint-Agnostic Grasp Pipeline using VLM and Partial Observations

arXiv:2603.07866v1 Announce Type: cross Abstract: Robust grasping in cluttered, unstructured environments remains challenging for mobile legged manipulators due to occlusions that lead to partial observations, unreliable depth estimates, and the need for collision-free, execution-feasible approaches. In this paper we present…

March 10, 2026

Pavement Missing Condition Data Imputation through Collective Learning-Based Graph Neural Networks

arXiv:2603.06625v1 Announce Type: new Abstract: Pavement condition data is important in providing information regarding the current state of the road network and in determining the needs of maintenance and rehabilitation treatments. However, the condition data is often incomplete due to…

March 10, 2026

SaiVLA-0: Cerebrum–Pons–Cerebellum Tripartite Architecture for Compute-Aware Vision-Language-Action

arXiv:2603.08124v1 Announce Type: cross Abstract: We revisit Vision-Language-Action through a neuroscience-inspired triad. Biologically, the Cerebrum provides stable high-level multimodal priors and remains frozen; the Pons Adapter integrates these cortical features with real-time proprioceptive inputs and compiles intent into execution-ready tokens;…

March 10, 2026