High-Probability Regret Bounds for Bandit Online Linear Optimization

Citation:

P. Bartlett, V. Dani, T. Hayes, S. Kakade, A. Rakhlin, and A. Tewari, “High-Probability Regret Bounds for Bandit Online Linear Optimization,” 21st Annual Conference on Learning Theory - COLT 2008, Helsinki, Finland, pp. 335-342, 2008.

Abstract:

We present a modification of the algorithm of Dani et al. (9) for the online linear optimization prob- lem in the bandit setting, which with high proba- bility has regret at mostO ( p T ) against an adap- tive adversary. This improves on the previous al- gorithm (9) whose regret is bounded in expecta- tion against an oblivious adversary. We obtain the same dependence on the dimension (n3=2) as that exhibited by Dani et al. The results of this paper rest firmly on those of (9) and the remark- able technique of Auer et al. (2) for obtaining high- probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.

Publisher's Version

See also: 2008
Last updated on 10/13/2021