Fast Online Learning with Gaussian Prior-Driven Hierarchical Unimodal Thompson Sampling
arXiv:2602.15972v1 Announce Type: new Abstract: We study a type of Multi-Armed Bandit (MAB) problems in which arms with a Gaussian reward feedback are clustered. Such an arm setting finds applications in many real-world problems, for example, mmWave communications and portfolio…
