Archives AI News

Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning

arXiv:2604.05164v1 Announce Type: new Abstract: As LLM reasoning performance plateau, improving inference-time compute efficiency is crucial to mitigate overthinking and long thinking traces even for simple queries. Prior approaches including length regularization, adaptive routing, and difficulty-based budget allocation primarily focus…

April 8, 2026

Understanding Uncertainty Sampling via Equivalent Loss

arXiv:2307.02719v4 Announce Type: replace Abstract: Uncertainty sampling is a prevalent active learning algorithm that queries sequentially the annotations of data samples which the current prediction model is uncertain about. However, the usage of uncertainty sampling has been largely heuristic: There…

April 8, 2026

General Multimodal Protein Design Enables DNA-Encoding of Chemistry

arXiv:2604.05181v1 Announce Type: new Abstract: Evolution is an extraordinary engine for enzymatic diversity, yet the chemistry it has explored remains a narrow slice of what DNA can encode. Deep generative models can design new proteins that bind ligands, but none…

April 8, 2026

How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators

arXiv:2502.06387v2 Announce Type: replace Abstract: Human-annotated preference data play an important role in aligning large language models (LLMs). In this paper, we study two connected questions: how to monitor the quality of human preference annotators and how to incentivize them…

April 8, 2026

Cross-fitted Proximal Learning for Model-Based Reinforcement Learning

arXiv:2604.05185v1 Announce Type: new Abstract: Model-based reinforcement learning is attractive for sequential decision-making because it explicitly estimates reward and transition models and then supports planning through simulated rollouts. In offline settings with hidden confounding, however, models learned directly from observational…

April 8, 2026

Inference-Time Scaling of Diffusion Language Models via Trajectory Refinement

arXiv:2507.08390v3 Announce Type: replace Abstract: Discrete diffusion models have recently emerged as strong alternatives to autoregressive language models, matching their performance through large-scale training. However, inference-time control remains relatively underexplored. In this work, we study how to steer generation toward…

April 8, 2026

FNO$^{angle theta}$: Extended Fourier neural operator for learning state and optimal control of distributed parameter systems

arXiv:2604.05187v1 Announce Type: new Abstract: We propose an extended Fourier neural operator (FNO) architecture for learning state and linear quadratic additive optimal control of systems governed by partial differential equations. Using the Ehrenpreis-Palamodov fundamental principle, we show that any state…

April 8, 2026

Near-optimal Linear Predictive Clustering in Non-separable Spaces via MIP and QPBO Reductions

arXiv:2511.10809v3 Announce Type: replace Abstract: Linear Predictive Clustering (LPC) partitions samples based on shared linear relationships between feature and target variables, with numerous applications including marketing, medicine, and education. Greedy optimization methods, commonly used for LPC, alternate between clustering and…

April 8, 2026

Vehicle-as-Prompt: A Unified Deep Reinforcement Learning Framework for Heterogeneous Fleet Vehicle Routing Problem

arXiv:2604.05195v1 Announce Type: new Abstract: Unlike traditional homogeneous routing problems, the Heterogeneous Fleet Vehicle Routing Problem (HFVRP) involves heterogeneous fixed costs, variable travel costs, and capacity constraints, rendering solution quality highly sensitive to vehicle selection. Furthermore, real-world logistics applications often…

April 8, 2026

A Hessian-Free Actor-Critic Algorithm for Bi-Level Reinforcement Learning with Applications to LLM Fine-Tuning

arXiv:2601.16399v4 Announce Type: replace Abstract: We study a structured bi-level optimization problem where the upper-level objective is a smooth function and the lower-level problem is policy optimization in a Markov decision process (MDP). The upper-level decision variable parameterizes the reward…

April 8, 2026