Publications

2007
S. M. Kakade and D. P. Foster, “Multi-View Regression via Canonical Correlation Analysis,” International Conference on Computational Learning Theory, vol. 4539, pp. 82-96, 2007. Publisher's VersionAbstract
In the multi-view regression problem, we have a regression problem where the input variable (which is a real vector) can be partitioned into two different views, where it is assumed that either view of the input is sufficient to make accurate predictions — this is essentially (a significantly weaker version of) the co-training assumption for the regression problem. We provide a semi-supervised algorithm which first uses unlabeled data to learn a norm (or, equivalently, a kernel) and then uses labeled data in a ridge regression algorithm (with this induced norm) to provide the predictor. The unlabeled data is used via canonical correlation analysis (CCA, which is a closely related to PCA for two random variables) to derive an appropriate norm over functions. We are able to characterize the intrinsic dimensionality of the subsequent ridge regression problem (which uses this norm) by the correlation coefficients provided by CCA in a rather simple expression. Interestingly, the norm used by the ridge regression algorithm is derived from CCA, unlike in standard kernel methods where a special apriori norm is assumed (i.e. a Banach space is assumed). We discuss how this result shows that unlabeled data can decrease the sample complexity.
Multi-View Regression via Canonical Correlation Analysis
E. Even-Dar, S. M. Kakade, and Y. Mansour, “The value of observation for monitoring dynamic systems,” IJCAI'07: Proceedings of the 20th international joint conference on Artifical intelligence, pp. 2474–2479, 2007. Publisher's VersionAbstract
We consider the fundamental problem of monitoring (i.e. tracking) the belief state in a dynamic system, when the model is only approximately correct and when the initial belief state might be unknown. In this general setting where the model is (perhaps only slightly) mis-specified, monitoring (and consequently planning) may be impossible as errors might accumulate over time. We provide a new characterization, the value of observation, which allows us to bound the error accumulation. The value of observation is a parameter that governs how much information the observation provides. For instance, in Partially Observable MDPs when it is 1 the POMDP is an MDP while for an unobservable Markov Decision Process the parameter is 0. Thus, the new parameter characterizes a spectrum from MDPs to unobservable MDPs depending on the amount of information conveyed in the observations.
The value of observation for monitoring dynamic systems
L. Ortiz, R. Schapire, and S. M. Kakade, Maximum Entropy Correlated Equilibria. AISTAT: , 2007. Publisher's VersionAbstract
We study maximum entropy correlated equilibria in (multi-player)games and provide two gradient-based algorithms that are guaranteedto converge to such equilibria. Although we do not provideconvergence rates for these algorithms, they do have strong connectionsto other algorithms (such as iterative scaling) which are effectiveheuristics for tasks such as statistical estimation.
Maximum Entropy Correlated Equilibria
2006
A. Beygelzimer, S. M. Kakade, and J. Langford, Cover Trees for Nearest Neighbor. ICML '06: Proceedings of the 23rd international conference on Machine learning: , 2006. Publisher's VersionAbstract
We present a tree data structure for fast nearest neighbor operations in general n-point metric spaces (where the data set consists of n points). The data structure requires O(n) space regardless of the metric's structure yet maintains all performance properties of a navigating net (Krauthgamer & Lee, 2004b). If the point set has a bounded expansion constant c, which is a measure of the intrinsic dimensionality, as defined in (Karger & Ruhl, 2002), the cover tree data structure can be constructed in O (c6n log n) time. Furthermore, nearest neighbor queries require time only logarithmic in n, in particular O (c12 log n) time. Our experimental results show speedups over the brute force search varying between one and several orders of magnitude on natural machine learning datasets.
Cover Trees for Nearest Neighbor
E. Even-Dar, S. M. Kakade, M. Kearns, and Y. Mansour, (In)Stability Properties of Limit Order Dynamics. ACM Conference on Electronic Commerce: , 2006. Publisher's VersionAbstract
We study the stability properties of the dynamics of the standard continuous limit-order mechanism that is used in modern equity markets. We ask whether such mechanisms are susceptible to “butterfly effects” — the infliction of large changes on common measures of market activity by only small perturbations of the order sequence. We show that the answer depends strongly on whether the market consists of “absolute” traders (who determine their prices independent of the current order book state) or “relative” traders (who determine their prices relative to the current bid and ask). We prove that while the absolute model enjoys provably strong stability properties, the relative model is vulnerable to great instability. Our theoretical results are supported by large-scale experiments using limit order data from INET, a large electronic exchange for NASDAQ stocks.
(In)Stability Properties of Limit Order Dynamics
S. M. Kakade and D. P. Foster, Calibration via Regression. 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este: , 2006. Publisher's VersionAbstract
In the online prediction setting, the concept of calibration entails having the empirical (conditional) frequencies match the claimed predicted probabilities. This contrasts with more traditional online prediction goals of getting a low cumulative loss. The differences between these goals have typically made them hard to compare with each other. This paper shows how to get an approximate form of calibration out of a traditional online loss minimization algorithm, namely online regression. As a corollary, we show how to construct calibrated forecasts on a collection of subsequences.
Calibration via Regression
S. M. Kakade and A. Kalai, From Batch to Transductive Online Learning. Advances in Neural Information Processing Systems 18 (NIPS 2005): , 2006. Publisher's VersionAbstract
It is well-known that everything that is learnable in the difficult online setting, where an arbitrary sequences of examples must be labeled one at a time, is also learnable in the batch setting, where examples are drawn independently from a distribution. We show a result in the opposite di- rection. We give an efficient conversion algorithm from batch to online that is transductive: it uses future unlabeled data. This demonstrates the equivalence between what is properly and efficiently learnable in a batch model and a transductive online model.
From Batch to Transductive Online Learning
S. M. Kakade, M. W. Seeger, and D. P. Foster, Worst-Case Bounds for Gaussian Process Models. Advances in Neural Information Processing Systems 18 (NIPS 2005): , 2006. Publisher's VersionAbstract
We present a competitive analysis of some non-parametric Bayesian al- gorithms in a worst-case online learning setting, where no probabilistic assumptions about the generation of the data are made. We consider models which use a Gaussian process prior (over the space of all func- tions) and provide bounds on the regret (under the log loss) for com- monly used non-parametric Bayesian algorithms — including Gaussian regression and logistic regression — which show how these algorithms can perform favorably under rather general conditions. These bounds ex- plicitly handle the infinite dimensionality of these non-parametric classes in a natural way. We also make formal connections to the minimax and minimum description length (MDL) framework. Here, we show precisely how Bayesian Gaussian regression is a minimax strategy.
Worst-Case Bounds for Gaussian Process Models
2005
E. Even-Dar, S. M. Kakade, and Y. Mansour, Reinforcement Learning in POMDPs Without Resets. IJCAI'05: Proceedings of the 19th international joint conference on Artificial intelligence: , 2005. Publisher's VersionAbstract
We consider the most realistic reinforcement learning setting in which an agent starts in an unknown environment (the POMDP) and must follow one continuous and uninterrupted chain of experience with no access to "resets" or "offline" simulation. We provide algorithms for general connected POMDPs that obtain near optimal average reward. One algorithm we present has a convergence rate which depends exponentially on a certain horizon time of an optimal policy, but has no dependence on the number of (unobservable) states. The main building block of our algorithms is an implementation of an approximate reset strategy, which we show always exists in every POMDP. An interesting aspect of our algorithms is how they use this strategy when balancing exploration and exploitation.
The Economic Properties of Social Networks
E. Even-Dar, S. M. Kakade, and Y. Mansour, Planning in POMDPs Using Multiplicity Automata. Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005): , 2005. Publisher's VersionAbstract
Planning and learning in Partially Observable MDPs (POMDPs) are among the most challenging tasks in both the AI and Operation Research communities. Although solutions to these problems are intractable in general, there might be special cases, such as structured POMDPs, which can be solved efficiently. A natural and possibly efficient way to represent a POMDP is through the predictive state representation (PSR) - a representation which recently has been receiving increasing attention. In this work, we relate POMDPs to multiplicity automata- showing that POMDPs can be represented by multiplicity automata with no increase in the representation size. Furthermore, we show that the size of the multiplicity automaton is equal to the rank of the predictive state representation. Therefore, we relate both the predictive state representation and POMDPs to the well-founded multiplicity automata literature. Based on the multiplicity automata representation, we provide a planning algorithm which is exponential only in the multiplicity automata rank rather than the number of states of the POMDP. As a result, whenever the predictive state representation is logarithmic in the standard POMDP representation, our planning algorithm is efficient.
Planning in POMDPs Using Multiplicity Automata
S. M. Kakade, M. Kearns, L. Ortiz, R. Pemantle, and S. Suri, The Economic Properties of Social Networks. Advances in Neural Information Processing Systems 17 (NIPS 2004): , 2005. Publisher's VersionAbstract
We examine the marriage of recent probabilistic generative models for social networks with classical frameworks from mathematical eco- nomics. We are particularly interested in how the statistical structure of such networks influences global economic quantities such as price vari- ation. Our findings are a mixture of formal analysis, simulation, and experiments on an international trade data set from the United Nations.
The Economic Properties of Social Networks
S. M. Kakade and M. Kearns, Trading in Markovian Price Models. 18th Annual Conference on Learning Theory, COLT 2005: , 2005. Publisher's VersionAbstract
We examine a Markovian model for the price evolution of a stock, in which the probability of local upward or downward movement is arbitrarily dependent on the current price itself (and perhaps some auxiliary state information). This model directly and considerably generalizes many of the most well-studied price evolution models in classical finance, including a variety of random walk, drift and diffusion models. Our main result is a “universally profitable” trading strategy — a single fixed strategy whose profitability competes with the optimal strategy (which knows all of the underlying parameters of the infinite and possibly nonstationary Markov process).
Trading in Markovian Price Models
E. Even-Dar, S. M. Kakade, and Y. Mansour, Experts in a Markov Decision Process. Advances in Neural Information Processing Systems 17 (NIPS 2004): , 2005. Publisher's VersionAbstract
We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well can an agent do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions. We also show that in the case that the dynamics change over time, the problem becomes computationally hard.
Experts in a Markov Decision Process
S. M. Kakade and A. Y. Ng, Online Bounds for Bayesian Algorithms. NIPS'04: Proceedings of the 17th International Conference on Neural Information Processing Systems: , 2005. Publisher's VersionAbstract
We present a competitive analysis of Bayesian learning algorithms in the online learning setting and show that many simple Bayesian algorithms (such as Gaussian linear regression and Bayesian logistic regression) perform favorably when compared, in retrospect, to the single best model in the model class. The analysis does not assume that the Bayesian algorithms’ modeling assumptions are “correct, ” and our bounds hold even if the data is adversarially chosen. For Gaussian linear regression (using logloss), our error bounds are comparable to the best bounds in the online learning literature, and we also provide a lower bound showing that Gaussian linear regression is optimal in a certain worst case sense. We also give bounds for some widely used maximum a posteriori (MAP) estimation algorithms, including regularized logistic regression.
Online Bounds for Bayesian Algorithms
2004
S. Kakade, M. Kearns, and L. Ortiz, Graphical Economics. International Conference on Computational Learning Theory: , 2004. Publisher's VersionAbstract
We introduce a graph-theoretic generalization of classical Arrow-Debreu economics, in which an undirected graph specifies which consumers or economies are permitted to engage in direct trade, and the graph topology may give rise to local variations in the prices of commodities. Our main technical contributions are: (1) a general existence theorem for graphical equilibria, which require local markets to clear; (2) an improved algorithm for computing approximate equilibria in standard (non-graphical) economies, which generalizes the algorithm of Deng et al. [2002] to non-linear utility functions; (3) an algorithm for computing equilibria in the graphical setting, which runs in time polynomial in the number of consumers in the special but important case in which the graph is a tree (again permitting non-linear utility functions). We also highlight many interesting learning problems that arise in our model, and relate them to learning in standard game theory and economics, graphical games, and graphical models for probabilistic inference.
Graphical Economics
S. Kakade, M. Kearns, Y. Mansour, and L. Ortiz, Competitive Algorithms for VWAP and Limit Order Trading. Proceedings of the ACM Electronic Commerce Conference: , 2004. Publisher's VersionAbstract
We introduce new online models for two important aspectsof modern financial markets: Volume Weighted Average Pricetrading and limit order books. We provide an extensivestudy of competitive algorithms in these models and relatethem to earlier online algorithms for stock trading.
Competitive Algorithms for VWAP and Limit Order Trading
S. M. Kakade and D. P. Foster, Deterministic Calibration and Nash Equilibrium. International Conference on Computational Learning Theory: , 2004. Publisher's VersionAbstract
We provide a natural learning process in which the joint frequency of (time-averaged) empirical play converges into the set of convex combinations of Nash equilibria. Furthermore, the actual distribution of players' actions is close to some (approximate) Nash equilibria on most rounds (on all but a vanishing fraction of the rounds). In this process, all players rationally choose their actions using a public prediction made by a deterministic, weakly calibrated algorithm. For this to be possible, we show that such a deterministic (weakly) calibrated learning algorithm exists.
Deterministic Calibration and Nash Equilibrium
2003
S. Kakade, M. Kearns, J. Langford, and L. Ortiz, Correlated Equilibria in Graphical Games. EC '03: Proceedings of the 4th ACM conference on Electronic commerce: , 2003. Publisher's VersionAbstract
We examine correlated equilibria in the recently introduced formalism of graphical games, a succinct representation for multiplayer games. We establish a natural and powerful relationship between the graphical structure of a multiplayer game and a certain Markov network representing distributions over joint actions. Our first main result establishes that this Markov network succinctly represents all correlated equilibria of the graphical game up to expected payoff equivalence. Our second main result provides a general algorithm for computing correlated equilibria in a graphical game based on its associated Markov network. For a special class of graphical games that includes trees, this algorithm runs in time polynomial in the graphical game representation (which is polynomial in the number of players and exponential in the graph degree).
Correlated Equilibria in Graphical Games
D. Bagnell, S. Kakade, A. Ng, and G. Schneider, Policy Search by Dynamic Programming. Advances in Neural Information Processing Systems 16 (NIPS 2003): , 2003. Publisher's VersionAbstract
We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem.
Policy Search by Dynamic Programming
S. M. Kakade, “On the Sample Complexity of Reinforcement Learning,” 2003.Abstract

This thesis is a detailed investigation into the following question: how much data must an agent collect in order to perform "reinforcement learning" successfully? This question is analogous to the classical issue of the sample complexity in supervised learning, but is harder because of the increased realism of the reinforcement learning setting. This thesis summarizes recent sample complexity results in the reinforcement learning literature and builds on these results to provide novel algorithms with strong performance guarantees.

We focus on a variety of reasonable performance criteria and sampling models by which agents may access the environment. For instance, in a policy search setting, we consider the problem of how much simulated experience is required to reliably choose a "good" policy among a restricted class of policies \Pi (as in Kearns, Mansour, and Ng [2000]). In a more online setting, we consider the case in which an agent is placed in an environment and must follow one unbroken chain of experience with no access to "offline" simulation (as in Kearns and Singh [1998]).

We build on the sample based algorithms suggested by Kearns, Mansour, and Ng [2000]. Their sample complexity bounds have no dependence on the size of the state space, an exponential dependence on the planning horizon time, and linear dependence on the complexity of \Pi . We suggest novel algorithms with more restricted guarantees whose sample complexities are again independent of the size of the state space and depend linearly on the complexity of the policy class \Pi , but have only a polynomial dependence on the horizon time. We pay particular attention to the tradeoffs made by such algorithms.

Pages