2003

2003
S. Kakade, M. Kearns, J. Langford, and L. Ortiz, Correlated Equilibria in Graphical Games. EC '03: Proceedings of the 4th ACM conference on Electronic commerce: , 2003. Publisher's VersionAbstract
We examine correlated equilibria in the recently introduced formalism of graphical games, a succinct representation for multiplayer games. We establish a natural and powerful relationship between the graphical structure of a multiplayer game and a certain Markov network representing distributions over joint actions. Our first main result establishes that this Markov network succinctly represents all correlated equilibria of the graphical game up to expected payoff equivalence. Our second main result provides a general algorithm for computing correlated equilibria in a graphical game based on its associated Markov network. For a special class of graphical games that includes trees, this algorithm runs in time polynomial in the graphical game representation (which is polynomial in the number of players and exponential in the graph degree).
Correlated Equilibria in Graphical Games
D. Bagnell, S. Kakade, A. Ng, and G. Schneider, Policy Search by Dynamic Programming. Advances in Neural Information Processing Systems 16 (NIPS 2003): , 2003. Publisher's VersionAbstract
We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem.
Policy Search by Dynamic Programming
S. Kakade, M. Kearns, and J. Langford, Exploration in Metric State Spaces. Proceedings of the 20th International Conference on Machine Learning: , 2003. Publisher's VersionAbstract
We present metric- E3a provably near-optimal algorithm for reinforcement learning in Markov decisionprocesses in which there is a naturalmetricon the state space that allows the construction of accurate localmodels. The algorithm is a generalization of the E3algorithm of Kearns and Singh, and assumes a black boxfor approximate planning. Unlike the original E3, metric-E3finds a near optimal policy in an amount of timethat does not directly depend on the size of the state space, but instead depends on the covering number of thestate space. Informally, the covering number is the number of neighborhoods required for accurate localmodeling.
Exploration in Metric State Spaces