Archives AI News

Do Schwartz Higher-Order Values Help Sentence-Level Human Value Detection? A Study of Hierarchical Gating and Calibration

arXiv:2602.00913v3 Announce Type: replace-cross Abstract: Human value detection from single sentences is a sparse, imbalanced multi-label task. We study whether Schwartz higher-order (HO) categories help this setting on ValueEval’24 / ValuesML (74K English sentences) under a compute-frugal budget. Rather than…

April 8, 2026

Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner

arXiv:2604.05112v1 Announce Type: new Abstract: Recent progress in in-context reinforcement learning (ICRL) has demonstrated its potential for training generalist agents that can acquire new tasks directly at inference. Algorithm Distillation (AD) pioneered this paradigm and was subsequently scaled to multi-domain…

April 8, 2026

PhaseFlow4D: Physically Constrained 4D Beam Reconstruction via Feedback-Guided Latent Diffusion

arXiv:2604.03885v2 Announce Type: replace-cross Abstract: We address the problem of recovering a time-varying 4D distribution from a sparse sequence of 2D projections – analogous to novel-view synthesis from sparse cameras, but applied to the 4D transverse phase space density $rho(x,p_x,y,p_y)$…

April 8, 2026

Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning

arXiv:2604.05134v1 Announce Type: new Abstract: How can you get a language model to reason in a task it natively struggles with? We study how reasoning evolves in a language model — from supervised fine-tuning (SFT) to reinforcement learning (RL) —…

April 8, 2026

Value Mirror Descent for Reinforcement Learning

arXiv:2604.06039v1 Announce Type: cross Abstract: Value iteration-type methods have been extensively studied for computing a nearly optimal value function in reinforcement learning (RL). Under a generative sampling model, these methods can achieve sharper sample complexity than policy optimization approaches, particularly…

April 8, 2026

The illusion of reasoning: step-level evaluation reveals decorative chain-of-thought in frontier language models

arXiv:2603.22816v2 Announce Type: replace-cross Abstract: Language models increasingly “show their work” by writing step-by-step reasoning before answering. But are these reasoning steps genuinely used, or decorative narratives generated after the model has already decided? We introduce step-level faithfulness evaluation –…

April 8, 2026

Service Placement in Small Cell Networks Using Distributed Best Arm Identification in Linear Bandits

arXiv:2506.22480v2 Announce Type: replace-cross Abstract: As users in small cell networks increasingly rely on computation-intensive services, cloud-based access often results in high latency. Multi-access edge computing (MEC) mitigates this by bringing computational resources closer to end users, with small base…

April 8, 2026

Automatic Replication of LLM Mistakes in Medical Conversations

arXiv:2512.20983v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly evaluated in clinical settings using multi-dimensional rubrics which quantify reasoning quality, safety, and patient-centeredness. Yet, replicating specific mistakes in other LLM models is not straightforward and often requires manual…

April 8, 2026

Quantization-Robust LLM Unlearning via Low-Rank Adaptation

arXiv:2602.13151v3 Announce Type: replace Abstract: Large Language Model (LLM) unlearning aims to remove targeted knowledge from a trained model, but practical deployments often require post-training quantization (PTQ) for efficient inference. However, aggressive low-bit PTQ can mask unlearning updates, causing quantized…

April 8, 2026

Choosing the Right Regularizer for Applied ML: Simulation Benchmarks of Popular Scikit-learn Regularization Frameworks

arXiv:2604.03541v2 Announce Type: replace Abstract: This study surveys the historical development of regularization, tracing its evolution from stepwise regression in the 1960s to recent advancements in formal error control, structured penalties for non-independent features, Bayesian methods, and l0-based regularization (among…

April 8, 2026