A Natural Policy Gradient

Citation:

S. Kakade, A Natural Policy Gradient. Advances in Neural Information Processing Systems 14 (NIPS 2001): , 2002.

Abstract:

We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the param(cid:173) eter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradi(cid:173) ent is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sut(cid:173) ton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.

See also: 2002
Last updated on 10/15/2021