Stochastic convex optimization with bandit feedback

Citation:

A. Agarwal, D. P. Foster, D. Hsu, S. M. Kakade, and A. Rakhlin, Stochastic convex optimization with bandit feedback. SIAM Journal on Optimization (SIOPT), 2013 and NIPS 2011: ArXiv Report, 2013.

Abstract:

This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set $\xset$ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point $x \in \xset$. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs $\otil(\poly(d)\sqrt{T})$ regret. Since any algorithm has regret at least Ω(T‾‾√) on this problem, our algorithm is optimal in terms of the scaling with T.

Publisher's Version

See also: 2013
Last updated on 10/10/2021