Policy Search by Dynamic Programming

Citation:

D. Bagnell, S. Kakade, A. Ng, and G. Schneider, Policy Search by Dynamic Programming. Advances in Neural Information Processing Systems 16 (NIPS 2003): , 2003.

Abstract:

We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem.

Publisher's Version

See also: 2003
Last updated on 10/15/2021