Policy Search by Dynamic Programming

Citation:

D. Bagnell, S. Kakade, A. Ng, and G. Schneider, Policy Search by Dynamic Programming. Advances in Neural Information Processing Systems 16 (NIPS 2003): , 2003.

Download

157 KB

Abstract:

We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a ﬁnite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem.

Publisher's Version

Sham M. Kakade

Policy Search by Dynamic Programming

Citation:

Abstract:

Recent Publications

css-theme

css-publications