2005

2005
E. Even-Dar, S. M. Kakade, and Y. Mansour, Reinforcement Learning in POMDPs Without Resets. IJCAI'05: Proceedings of the 19th international joint conference on Artificial intelligence: , 2005. Publisher's VersionAbstract
We consider the most realistic reinforcement learning setting in which an agent starts in an unknown environment (the POMDP) and must follow one continuous and uninterrupted chain of experience with no access to "resets" or "offline" simulation. We provide algorithms for general connected POMDPs that obtain near optimal average reward. One algorithm we present has a convergence rate which depends exponentially on a certain horizon time of an optimal policy, but has no dependence on the number of (unobservable) states. The main building block of our algorithms is an implementation of an approximate reset strategy, which we show always exists in every POMDP. An interesting aspect of our algorithms is how they use this strategy when balancing exploration and exploitation.
The Economic Properties of Social Networks
E. Even-Dar, S. M. Kakade, and Y. Mansour, Planning in POMDPs Using Multiplicity Automata. Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005): , 2005. Publisher's VersionAbstract
Planning and learning in Partially Observable MDPs (POMDPs) are among the most challenging tasks in both the AI and Operation Research communities. Although solutions to these problems are intractable in general, there might be special cases, such as structured POMDPs, which can be solved efficiently. A natural and possibly efficient way to represent a POMDP is through the predictive state representation (PSR) - a representation which recently has been receiving increasing attention. In this work, we relate POMDPs to multiplicity automata- showing that POMDPs can be represented by multiplicity automata with no increase in the representation size. Furthermore, we show that the size of the multiplicity automaton is equal to the rank of the predictive state representation. Therefore, we relate both the predictive state representation and POMDPs to the well-founded multiplicity automata literature. Based on the multiplicity automata representation, we provide a planning algorithm which is exponential only in the multiplicity automata rank rather than the number of states of the POMDP. As a result, whenever the predictive state representation is logarithmic in the standard POMDP representation, our planning algorithm is efficient.
Planning in POMDPs Using Multiplicity Automata
S. M. Kakade, M. Kearns, L. Ortiz, R. Pemantle, and S. Suri, The Economic Properties of Social Networks. Advances in Neural Information Processing Systems 17 (NIPS 2004): , 2005. Publisher's VersionAbstract
We examine the marriage of recent probabilistic generative models for social networks with classical frameworks from mathematical eco- nomics. We are particularly interested in how the statistical structure of such networks influences global economic quantities such as price vari- ation. Our findings are a mixture of formal analysis, simulation, and experiments on an international trade data set from the United Nations.
The Economic Properties of Social Networks
S. M. Kakade and M. Kearns, Trading in Markovian Price Models. 18th Annual Conference on Learning Theory, COLT 2005: , 2005. Publisher's VersionAbstract
We examine a Markovian model for the price evolution of a stock, in which the probability of local upward or downward movement is arbitrarily dependent on the current price itself (and perhaps some auxiliary state information). This model directly and considerably generalizes many of the most well-studied price evolution models in classical finance, including a variety of random walk, drift and diffusion models. Our main result is a “universally profitable” trading strategy — a single fixed strategy whose profitability competes with the optimal strategy (which knows all of the underlying parameters of the infinite and possibly nonstationary Markov process).
Trading in Markovian Price Models
E. Even-Dar, S. M. Kakade, and Y. Mansour, Experts in a Markov Decision Process. Advances in Neural Information Processing Systems 17 (NIPS 2004): , 2005. Publisher's VersionAbstract
We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well can an agent do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions. We also show that in the case that the dynamics change over time, the problem becomes computationally hard.
Experts in a Markov Decision Process
S. M. Kakade and A. Y. Ng, Online Bounds for Bayesian Algorithms. NIPS'04: Proceedings of the 17th International Conference on Neural Information Processing Systems: , 2005. Publisher's VersionAbstract
We present a competitive analysis of Bayesian learning algorithms in the online learning setting and show that many simple Bayesian algorithms (such as Gaussian linear regression and Bayesian logistic regression) perform favorably when compared, in retrospect, to the single best model in the model class. The analysis does not assume that the Bayesian algorithms’ modeling assumptions are “correct, ” and our bounds hold even if the data is adversarially chosen. For Gaussian linear regression (using logloss), our error bounds are comparable to the best bounds in the online learning literature, and we also provide a lower bound showing that Gaussian linear regression is optimal in a certain worst case sense. We also give bounds for some widely used maximum a posteriori (MAP) estimation algorithms, including regularized logistic regression.
Online Bounds for Bayesian Algorithms