Archives AI News

Value-Conflict Diagnostics Reveal Widespread Alignment Faking in Language Models

arXiv:2604.20995v1 Announce Type: new Abstract: Alignment faking, where a model behaves aligned with developer policy when monitored but reverts to its own preferences when unobserved, is a concerning yet poorly understood phenomenon, in part because current diagnostic tools remain limited.…

April 25, 2026

MISTY: High-Throughput Motion Planning via Mixer-based Single-step Drifting

arXiv:2604.21489v1 Announce Type: cross Abstract: Multi-modal trajectory generation is essential for safe autonomous driving, yet existing diffusion-based planners suffer from high inference latency due to iterative neural function evaluations. This paper presents MISTY (Mixer-based Inference for Single-step Trajectory-drifting Yield), a…

April 25, 2026

Geometric Monomial (GEM): a family of rational 2N-differentiable activation functions

arXiv:2604.21677v1 Announce Type: cross Abstract: The choice of activation function plays a crucial role in the optimization and performance of deep neural networks. While the Rectified Linear Unit (ReLU) remains the dominant choice due to its simplicity and effectiveness, its…

April 25, 2026

Cognitive Amplification vs Cognitive Delegation in Human-AI Systems: A Metric Framework

arXiv:2603.18677v2 Announce Type: replace-cross Abstract: Artificial intelligence is increasingly embedded in human decision making. In some cases, it enhances human reasoning. In others, it fosters excessive cognitive dependence. This paper introduces a conceptual and mathematical framework to distinguish cognitive amplification,…

April 25, 2026

Adaptive Test-Time Compute Allocation with Evolving In-Context Demonstrations

arXiv:2604.21018v1 Announce Type: new Abstract: While scaling test-time compute can substantially improve model performance, existing approaches either rely on static compute allocation or sample from fixed generation distributions. In this work, we introduce a test-time compute allocation framework that jointly…

April 25, 2026

The Last Harness You’ll Ever Build

arXiv:2604.21003v1 Announce Type: new Abstract: AI agents are increasingly deployed on complex, domain-specific workflows — navigating enterprise web applications that require dozens of clicks and form fills, orchestrating multi-step research pipelines that span search, extraction, and synthesis, automating code review…

April 25, 2026

Value-Conflict Diagnostics Reveal Widespread Alignment Faking in Language Models

April 25, 2026

Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

arXiv:2604.20987v1 Announce Type: new Abstract: Long horizon interactive environments are a testbed for evaluating agents skill usage abilities. These environments demand multi step reasoning, the chaining of multiple skills over many timesteps, and robust decision making under delayed rewards and…

April 25, 2026

Mitigating Lost in Multi-turn Conversation via Curriculum RL with Verifiable Accuracy and Abstention Rewards

arXiv:2510.18731v2 Announce Type: replace-cross Abstract: Large Language Models demonstrate strong capabilities in single-turn instruction following but suffer from Lost-in-Conversation (LiC), a degradation in performance as information is revealed progressively in multi-turn settings. Motivated by the current progress on Reinforcement Learning…

April 25, 2026

Who Defines Fairness? Target-Based Prompting for Demographic Representation in Generative Models

arXiv:2604.21036v1 Announce Type: new Abstract: Text-to-image(T2I) models like Stable Diffusion and DALL-E have made generative AI widely accessible, yet recent studies reveal that these systems often replicate societal biases, particularly in how they depict demographic groups across professions. Prompts such…

April 25, 2026