Archives AI News

Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning

arXiv:2505.05226v2 Announce Type: replace Abstract: The Combined Algorithm Selection and Hyperparameter optimization (CASH) is a challenging resource allocation problem in the field of AutoML. We propose MaxUCB, a max k-armed bandit method to trade off exploring different model classes and…

VisPlay: Self-Evolving Vision-Language Models from Images

arXiv:2511.15661v1 Announce Type: cross Abstract: Reinforcement learning (RL) provides a principled framework for improving Vision-Language Models (VLMs) on complex reasoning tasks. However, existing RL approaches often rely on human-annotated labels or task-specific heuristics to define verifiable rewards, both of which…

$pi^{*}_{0.6}$: a VLA That Learns From Experience

arXiv:2511.14759v2 Announce Type: replace Abstract: We study how vision-language-action (VLA) models can improve through real-world deployments via reinforcement learning (RL). We present a general-purpose method, RL with Experience and Corrections via Advantage-conditioned Policies (RECAP), that provides for RL training of…