arXiv:2505.06709v2 Announce Type: replace
Abstract: We study Online Convex Optimization with adversarial constraints (COCO). At each round a learner selects an action from a convex decision set and then an adversary reveals a convex cost and a convex constraint function. The goal of the learner is to select a sequence of actions to minimize both regret and the cumulative constraint violation (CCV) over a horizon of length $T$. The best-known policy for this problem achieves $O(sqrt{T})$ regret and $tilde{O}(sqrt{T})$ CCV. In this paper, we improve this by trading off regret to achieve substantially smaller CCV. This trade-off is especially important in safety-critical applications, where satisfying the safety constraints is non-negotiable. Specifically, for any bounded convex cost and constraint functions, we propose an online policy that achieves $tilde{O}(sqrt{dT}+ T^beta)$ regret and $tilde{O}(dT^{1-beta})$ CCV, where $d$ is the dimension of the decision set and $beta in [0,1]$ is a tunable parameter. We begin with a special case, called the $textsf{Constrained Expert}$ problem, where the decision set is a probability simplex and the cost and constraint functions are linear. Leveraging a new adaptive small-loss regret bound, we propose a computationally efficient policy for the $textsf{Constrained Expert}$ problem, that attains $O(sqrt{Tln N}+T^{beta})$ regret and $tilde{O}(T^{1-beta} ln N)$ CCV for $N$ number of experts. The original problem is then reduced to the $textsf{Constrained Expert}$ problem via a covering argument. Finally, with an additional $M$-smoothness assumption, we propose a computationally efficient first-order policy attaining $O(sqrt{MT}+T^{beta})$ regret and $tilde{O}(MT^{1-beta})$ CCV.
