Archives AI News

Do Schwartz Higher-Order Values Help Sentence-Level Human Value Detection? A Study of Hierarchical Gating and Calibration

arXiv:2602.00913v3 Announce Type: replace-cross Abstract: Human value detection from single sentences is a sparse, imbalanced multi-label task. We study whether Schwartz higher-order (HO) categories help this setting on ValueEval’24 / ValuesML (74K English sentences) under a compute-frugal budget. Rather than…

April 8, 2026

Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner

arXiv:2604.05112v1 Announce Type: new Abstract: Recent progress in in-context reinforcement learning (ICRL) has demonstrated its potential for training generalist agents that can acquire new tasks directly at inference. Algorithm Distillation (AD) pioneered this paradigm and was subsequently scaled to multi-domain…

April 8, 2026

PhaseFlow4D: Physically Constrained 4D Beam Reconstruction via Feedback-Guided Latent Diffusion

arXiv:2604.03885v2 Announce Type: replace-cross Abstract: We address the problem of recovering a time-varying 4D distribution from a sparse sequence of 2D projections – analogous to novel-view synthesis from sparse cameras, but applied to the 4D transverse phase space density $rho(x,p_x,y,p_y)$…

April 8, 2026

Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning

arXiv:2604.05134v1 Announce Type: new Abstract: How can you get a language model to reason in a task it natively struggles with? We study how reasoning evolves in a language model — from supervised fine-tuning (SFT) to reinforcement learning (RL) —…

April 8, 2026

Value Mirror Descent for Reinforcement Learning

arXiv:2604.06039v1 Announce Type: cross Abstract: Value iteration-type methods have been extensively studied for computing a nearly optimal value function in reinforcement learning (RL). Under a generative sampling model, these methods can achieve sharper sample complexity than policy optimization approaches, particularly…

April 8, 2026

Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning

arXiv:2604.05164v1 Announce Type: new Abstract: As LLM reasoning performance plateau, improving inference-time compute efficiency is crucial to mitigate overthinking and long thinking traces even for simple queries. Prior approaches including length regularization, adaptive routing, and difficulty-based budget allocation primarily focus…

April 8, 2026

Understanding Uncertainty Sampling via Equivalent Loss

arXiv:2307.02719v4 Announce Type: replace Abstract: Uncertainty sampling is a prevalent active learning algorithm that queries sequentially the annotations of data samples which the current prediction model is uncertain about. However, the usage of uncertainty sampling has been largely heuristic: There…

April 8, 2026

General Multimodal Protein Design Enables DNA-Encoding of Chemistry

arXiv:2604.05181v1 Announce Type: new Abstract: Evolution is an extraordinary engine for enzymatic diversity, yet the chemistry it has explored remains a narrow slice of what DNA can encode. Deep generative models can design new proteins that bind ligands, but none…

April 8, 2026

How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators

arXiv:2502.06387v2 Announce Type: replace Abstract: Human-annotated preference data play an important role in aligning large language models (LLMs). In this paper, we study two connected questions: how to monitor the quality of human preference annotators and how to incentivize them…

April 8, 2026

Cross-fitted Proximal Learning for Model-Based Reinforcement Learning

arXiv:2604.05185v1 Announce Type: new Abstract: Model-based reinforcement learning is attractive for sequential decision-making because it explicitly estimates reward and transition models and then supports planning through simulated rollouts. In offline settings with hidden confounding, however, models learned directly from observational…

April 8, 2026