Archives AI News

Exploring Natural Language-Based Strategies for Efficient Number Learning in Children through Reinforcement Learning

arXiv:2410.08334v2 Announce Type: replace-cross Abstract: In this paper, we build a reinforcement learning framework to study how children compose numbers using base-ten blocks. Studying numerical cognition in toddlers offers a powerful window into the learning process itself, because numbers sit…

April 10, 2026

Inference-Time Scaling of Diffusion Language Models via Trajectory Refinement

arXiv:2507.08390v4 Announce Type: replace Abstract: Discrete diffusion models have recently emerged as strong alternatives to autoregressive language models, matching their performance through large-scale training. However, inference-time control remains relatively underexplored. In this work, we study how to steer generation toward…

April 10, 2026

Tensor-Efficient High-Dimensional Q-learning

arXiv:2511.03595v2 Announce Type: replace Abstract: High-dimensional reinforcement learning(RL) faces challenges with complex calculations and low sample efficiency in large state-action spaces. Q-learning algorithms struggle particularly with the curse of dimensionality, where the number of state-action pairs grows exponentially with problem…

April 10, 2026

Reason in Chains, Learn in Trees: Self-Rectification and Grafting for Multi-turn Agent Policy Optimization

arXiv:2604.07165v1 Announce Type: cross Abstract: Reinforcement learning for Large Language Model agents is often hindered by sparse rewards in multi-step reasoning tasks. Existing approaches like Group Relative Policy Optimization treat sampled trajectories as independent chains, assigning uniform credit to all…

April 10, 2026

Gaussian Approximation for Asynchronous Q-learning

arXiv:2604.07323v1 Announce Type: cross Abstract: In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak-Ruppert averaged iterates generated by the asynchronous Q-learning algorithm with a polynomial stepsize $k^{-omega},, omega in (1/2, 1]$. Assuming that…

April 10, 2026

RAGEN-2: Reasoning Collapse in Agentic RL

arXiv:2604.06268v1 Announce Type: new Abstract: RL training of multi-turn LLM agents is inherently unstable, and reasoning quality directly determines task performance. Entropy is widely used to track reasoning stability. However, entropy only measures diversity within the same input, and cannot…

April 10, 2026

Evaluating PQC KEMs, Combiners, and Cascade Encryption via Adaptive IND-CPA Testing Using Deep Learning

arXiv:2604.06942v1 Announce Type: cross Abstract: Ensuring ciphertext indistinguishability is fundamental to cryptographic security, but empirically validating this property in real implementations and hybrid settings presents practical challenges. The transition to post-quantum cryptography (PQC), with its hybrid constructions combining classical and…

April 10, 2026

SMT-AD: a scalable quantum-inspired anomaly detection approach

arXiv:2604.06265v1 Announce Type: new Abstract: Quantum-inspired tensor networks algorithms have shown to be effective and efficient models for machine learning tasks, including anomaly detection. Here, we propose a highly parallelizable quantum-inspired approach which we call SMT-AD from Superposition of Multiresolution…

April 10, 2026

MO-RiskVAE: A Multi-Omics Variational Autoencoder for Survival Risk Modeling in Multiple MyelomaMO-RiskVAE

arXiv:2604.06267v1 Announce Type: new Abstract: Multimodal variational autoencoders (VAEs) have emerged as a powerful framework for survival risk modeling in multiple myeloma by integrating heterogeneous omics and clinical data. However, when trained under survival supervision, standard latent regularization strategies often…

April 10, 2026

$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models

arXiv:2604.06260v1 Announce Type: new Abstract: Test-time scaling investigates whether a fixed diffusion language model (DLM) can generate better outputs when given more inference compute, without additional training. However, naive best-of-$K$ sampling is fundamentally limited because it repeatedly draws from the…

April 10, 2026