arXiv:2502.14790v5 Announce Type: replace-cross
Abstract: We develop a form Thompson sampling for online learning under full feedback – also known as prediction with expert advice – where the learner’s prior is defined over the space of an adversary’s future actions, rather than the space of experts. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret. In the classical finite-expert setting, this recovers optimal rates. As an initial step towards practical online learning in settings with a potentially-uncountably-infinite number of experts, we show that Thompson sampling over the $d$-dimensional unit cube, using a certain Gaussian process prior widely-used in the Bayesian optimization literature, has a $mathcal{O}Big(betasqrt{Tdlog(1+sqrt{d}frac{lambda}{beta})}Big)$ rate against a $beta$-bounded $lambda$-Lipschitz adversary.
