Publications by Year: 2013

2013
A. Anandkumar, R. Ge, D. Hsu, and S. M. Kakade, A Tensor Spectral Approach to Learning Mixed Membership Community Models. COLT: ArXiv Report, 2013. Publisher's VersionAbstract
Community detection is the task of detecting hidden communities from observed interactions. Guaranteed community detection has so far been mostly limited to models with non-overlapping communities such as the stochastic block model. In this paper, we remove this restriction, and provide guaranteed community detection for a family of probabilistic network models with overlapping communities, termed as the mixed membership Dirichlet model, first introduced by Airoldi et al. This model allows for nodes to have fractional memberships in multiple communities and assumes that the community memberships are drawn from a Dirichlet distribution. Moreover, it contains the stochastic block model as a special case. We propose a unified approach to learning these models via a tensor spectral decomposition method. Our estimator is based on low-order moment tensor of the observed network, consisting of 3-star counts. Our learning method is fast and is based on simple linear algebraic operations, e.g. singular value decomposition and tensor power iterations. We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method. As an important special case, our results match the best known scaling requirements for the (homogeneous) stochastic block model.
A Tensor Spectral Approach to Learning Mixed Membership Community Models
A. Anandkumar, D. Hsu, M. Janzamin, and S. M. Kakade, When are overcomplete topic models identifiable? Uniqueness of tensor Tucker decompositions with structured sparsity. NIPS: ArXiv Report, 2013. Publisher's VersionAbstract
Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime, where the number of latent topics can greatly exceed the size of the observed word vocabulary. While general overcomplete topic models are not identifiable, we establish generic identifiability under a constraint, referred to as topic persistence. Our sufficient conditions for identifiability involve a novel set of "higher order" expansion conditions on the topic-word matrix or the population structure of the model. This set of higher-order expansion conditions allow for overcomplete models, and require the existence of a perfect matching from latent topics to higher order observed words. We establish that random structured topic models are identifiable w.h.p. in the overcomplete regime. Our identifiability results allows for general (non-degenerate) distributions for modeling the topic proportions, and thus, we can handle arbitrarily correlated topics in our framework. Our identifiability results imply uniqueness of a class of tensor decompositions with structured sparsity which is contained in the class of Tucker decompositions, but is more general than the Candecomp/Parafac (CP) decomposition.
When are overcomplete topic models identifiable? Uniqueness of tensor Tucker decompositions with structured sparsity
A. Anandkumar, D. Hsu, A. Javanmard, and S. M. Kakade, Learning Linear Bayesian Networks with Latent Variables. Algorithmica special issue on Machine Learning (2014), ICML: ArXiv Report, 2013. Publisher's VersionAbstract
Unsupervised estimation of latent variable models is a fundamental problem central to numerous applications of machine learning and statistics. This work presents a principled approach for estimating broad classes of such models, including probabilistic topic models and latent linear Bayesian networks, using only second-order observed moments. The sufficient conditions for identifiability of these models are primarily based on weak expansion constraints on the topic-word matrix, for topic models, and on the directed acyclic graph, for Bayesian networks. Because no assumptions are made on the distribution among the latent variables, the approach can handle arbitrary correlations among the topics or latent factors. In addition, a tractable learning method via ℓ1 optimization is proposed and studied in numerical experiments.
Learning Linear Bayesian Networks with Latent Variables
P. Dhillon, D. P. Foster, S. M. Kakade, and L. Ungar, A Risk Comparison of Ordinary Least Squares vs Ridge Regression. JMLR: ArXiv Report, 2013. Publisher's VersionAbstract
We compare the risk of ridge regression to a simple variant of ordinary least squares, in which one simply projects the data onto a finite dimensional subspace (as specified by a Principal Component Analysis) and then performs an ordinary (un-regularized) least squares regression in this subspace. This note shows that the risk of this ordinary least squares method is within a constant factor (namely 4) of the risk of ridge regression.
A Risk Comparison of Ordinary Least Squares vs Ridge Regression
S. M. Kakade, I. Lobel, and H. Nazerzadeh, Optimal Dynamic Mechanism Design and the Virtual Pivot Mechanism. Operations Research: SSRN Report, 2013. Publisher's VersionAbstract
We consider the problem of designing optimal mechanisms for settings where agents have dynamic private information. We present the Virtual-Pivot Mechanism, that is optimal in a large class of environments that satisfy a separability condition. The mechanism satisfies a rather strong equilibrium notion (it is periodic ex-post incentive compatible and individually rational). We provide both necessary and sufficient conditions for immediate incentive compatibility for mechanisms that satisfy periodic ex-post incentive compatibility in future periods. The result also yields a strikingly simple mechanism for selling a sequence of items to a single buyer. We also show the allocation rule of the Virtual-Pivot Mechanism has a very simple structure (a Virtual Index) in multi-armed bandit settings. Finally, we show through examples that the relaxation technique we use does not produce optimal dynamic mechanisms in general non-separable environments.
Optimal Dynamic Mechanism Design and the Virtual Pivot Mechanism
D. Hsu and S. M. Kakade, Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions. 4th Innovations in Theoretical Computer Science (ITCS): ArXiv Report, 2013. Publisher's VersionAbstract
This work provides a computationally efficient and statistically consistent moment-based estimator for mixtures of spherical Gaussians. Under the condition that component means are in general position, a simple spectral decomposition technique yields consistent parameter estimates from low-order observable moments, without additional minimum separation assumptions needed by previous computationally efficient estimation procedures. Thus computational and information-theoretic barriers to efficient estimation in mixture models are precluded when the mixture components have means in general position and spherical covariances. Some connections are made to estimation problems related to independent component analysis.
Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions
A. Agarwal, D. P. Foster, D. Hsu, S. M. Kakade, and A. Rakhlin, Stochastic convex optimization with bandit feedback. SIAM Journal on Optimization (SIOPT), 2013 and NIPS 2011: ArXiv Report, 2013. Publisher's VersionAbstract
This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set $\xset$ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point $x \in \xset$. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs $\otil(\poly(d)\sqrt{T})$ regret. Since any algorithm has regret at least Ω(T‾‾√) on this problem, our algorithm is optimal in terms of the scaling with T.
Stochastic convex optimization with bandit feedback