Publications by Year: 2020

2020
N. Agarwal, S. Kakade, R. Kidambi, Y. T. Lee, P. Netrapalli, and A. Sidford, Leverage Score Sampling for Faster Accelerated Regression and Empirical Risk Minimization. ALT: ArXiv Report, 2020. Publisher's VersionAbstract
Given a matrix A∈ℝn×d and a vector b∈ℝd, we show how to compute an ϵ-approximate solution to the regression problem minx∈ℝd12‖Ax−b‖22 in time Õ ((n+d⋅κsum‾‾‾‾‾‾‾√)⋅s⋅logϵ−1) where κsum=tr(A⊤A)/λmin(ATA) and s is the maximum number of non-zero entries in a row of A. Our algorithm improves upon the previous best running time of Õ ((n+n⋅κsum‾‾‾‾‾‾‾√)⋅s⋅logϵ−1). We achieve our result through a careful combination of leverage score sampling techniques, proximal point methods, and accelerated coordinate descent. Our method not only matches the performance of previous methods, but further improves whenever leverage scores of rows are small (up to polylogarithmic factors). We also provide a non-linear generalization of these results that improves the running time for solving a broader class of ERM problems.
Leverage Score Sampling for Faster Accelerated Regression and Empirical Risk Minimization
C. Ilin, S. E. Annan-Phan, X. H. Tai, S. Mehra, S. M. Hsiang, and J. E. Blumenstock, Public Mobility Data Enables COVID-19 Forecasting and Management at Local and Global Scales, vol. 28120. NBER: ArXiv Report, 2020. Publisher's VersionAbstract
Policymakers everywhere are working to determine the set of restrictions that will effectively contain the spread of COVID-19 without excessively stifling economic activity. We show that publicly available data on human mobility — collected by Google, Facebook, and other providers — can be used to evaluate the effectiveness of non-pharmaceutical interventions and forecast the spread of COVID-19. This approach relies on simple and transparent statistical models, and involves minimal assumptions about disease dynamics. We demonstrate the effectiveness of this approach using local and regional data from China, France, Italy, South Korea, and the United States, as well as national data from 80 countries around the world.
Public Mobility Data Enables COVID-19 Forecasting and Management at Local and Global Scales
C. Jin, S. M. Kakade, A. Krishnamurthy, and Q. Liu, Sample-Efficient Reinforcement Learning of Undercomplete POMDPs. NeurIPS: ArXiv Report, 2020. Publisher's VersionAbstract
Partial observability is a common challenge in many reinforcement learning applications, which requires an agent to maintain memory, infer latent states, and integrate this past information into exploration. This challenge leads to a number of computational and statistical hardness results for learning general Partially Observable Markov Decision Processes (POMDPs). This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of POMDPs. In particular, we present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works. OOM-UCB achieves an optimal sample complexity of ̃ (1/ε2) for finding an ε-optimal policy, along with being polynomial in all other relevant quantities. As an interesting special case, we also provide a computationally and statistically efficient algorithm for POMDPs with deterministic state transitions.
Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
C. Wei, S. Kakade, and T. Ma., The Implicit and Explicit Regularization Effects of Dropout. NeurIPS: ArXiv Report, 2020. Publisher's VersionAbstract
Dropout is a widely-used regularization technique, often required to obtain state-of-the-art for a number of architectures. This work demonstrates that dropout introduces two distinct but entangled regularization effects: an explicit effect (also studied in prior work) which occurs since dropout modifies the expected training objective, and, perhaps surprisingly, an additional implicit effect from the stochasticity in the dropout training update. This implicit regularization effect is analogous to the effect of stochasticity in small mini-batch stochastic gradient descent. We disentangle these two effects through controlled experiments. We then derive analytic simplifications which characterize each effect in terms of the derivatives of the model and the loss, for deep neural networks. We demonstrate these simplified, analytic regularizers accurately capture the important aspects of dropout, showing they faithfully replace dropout in practice.
The Implicit and Explicit Regularization Effects of Dropout
A. Agarwal, S. M. Kakade, J. D. Lee, and G. Mahajan, Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes. COLT: ArXiv Report, 2020. Publisher's VersionAbstract
Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces. However, little is known about even their most basic theoretical convergence properties, including: if and how fast they converge to a globally optimal solution or how they cope with approximation error due to using a restricted class of parametric policies. This work provides provable characterizations of the computational, approximation, and sample size properties of policy gradient methods in the context of discounted Markov Decision Processes (MDPs). We focus on both: "tabular" policy parameterizations, where the optimal policy is contained in the class and where we show global convergence to the optimal policy; and parametric policy classes (considering both log-linear and neural policy classes), which may not contain the optimal policy and where we provide agnostic learning results. One central contribution of this work is in providing approximation guarantees that are average case -- which avoid explicit worst-case dependencies on the size of state space -- by making a formal connection to supervised learning under distribution shift. This characterization shows an important interplay between estimation error, approximation error, and exploration (as characterized through a precisely defined condition number).
Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes
K. Zhang, S. M. Kakade, T. Başar, and L. F. Yang, Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexit. NeurIPS: ArXiv Report, 2020. Publisher's VersionAbstract
Model-based reinforcement learning (RL), which finds an optimal policy using an empirical model, has long been recognized as one of the corner stones of RL. It is especially suitable for multi-agent RL (MARL), as it naturally decouples the learning and the planning phases, and avoids the non-stationarity problem when all agents are improving their policies simultaneously using samples. Though intuitive and widely-used, the sample complexity of model-based MARL algorithms has not been fully investigated. In this paper, our goal is to address the fundamental question about its sample complexity. We study arguably the most basic MARL setting: two-player discounted zero-sum Markov games, given only access to a generative model. We show that model-based MARL achieves a sample complexity of Õ (|S||A||B|(1−γ)−3ϵ−2) for finding the Nash equilibrium (NE) value up to some ϵ error, and the ϵ-NE policies with a smooth planning oracle, where γ is the discount factor, and S,A,B denote the state space, and the action spaces for the two agents. We further show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge, by establishing a matching lower bound. This is in contrast to the usual reward-aware setting, with a Ω̃ (|S|(|A|+|B|)(1−γ)−3ϵ−2) lower bound, where this model-based approach is near-optimal with only a gap on the |A|,|B| dependence. Our results not only demonstrate the sample-efficiency of this basic model-based approach in MARL, but also elaborate on the fundamental tradeoff between its power (easily handling the more challenging reward-agnostic case) and limitation (less adaptive and suboptimal in |A|,|B|), particularly arises in the multi-agent context.
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexit
N. Kohli, E. Aiken, and J. E. Blumenstock, Privacy Guarantees for Personal Mobility Data in Humanitarian Response, vol. Volume, no. issueNumber. KDD 2020 Workshop on Humanitarian Mapping, San Diego, CA: , 2020, pp. pageNBumber. Publisher's Version
A. Agarwal, M. Henaff, S. Kakade, and W. Sun, PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learnin. NeurIPS: ArXiv Report, 2020. Publisher's VersionAbstract
Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies. Their primary drawback is that, by being local in nature, they fail to adequately explore the environment. In contrast, while model-based approaches and Q-learning directly handle exploration through the use of optimism, their ability to handle model misspecification and function approximation is far less evident. This work introduces the the Policy Cover-Policy Gradient (PC-PG) algorithm, which provably balances the exploration vs. exploitation tradeoff using an ensemble of learned policies (the policy cover). PC-PG enjoys polynomial sample complexity and run time for both tabular MDPs and, more generally, linear MDPs in an infinite dimensional RKHS. Furthermore, PC-PG also has strong guarantees under model misspecification that go beyond the standard worst case ℓ∞ assumptions; this includes approximation guarantees for state aggregation under an average case error assumption, along with guarantees under a more general assumption where the approximation error under distribution shift is controlled. We complement the theory with empirical evaluation across a variety of domains in both reward-free and reward-driven settings.
PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learnin
A. Agarwal, S. Kakade, A. Krishnamurthy, and W. Sun, FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs. NeurIPS: ArXiv Report, 2020. Publisher's VersionAbstract
In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common practice to make parametric assumptions where values or policies are functions of some low dimensional feature space. This work focuses on the representation learning question: how can we learn such features? Under the assumption that the underlying (unknown) dynamics correspond to a low rank transition matrix, we show how the representation learning question is related to a particular non-linear matrix decomposition problem. Structurally, we make precise connections between these low rank MDPs and latent variable models, showing how they significantly generalize prior formulations for representation learning in RL. Algorithmically, we develop FLAMBE, which engages in exploration and representation learning for provably efficient RL in low rank transition models.
FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs
R. Wang, S. S. Du, L. F. Yang, and S. M. Kakade, Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning. NeurIPS: ArXiv Report, 2020. Publisher's VersionAbstract
Learning to plan for long horizons is a central challenge in episodic reinforcement learning problems. A fundamental question is to understand how the difficulty of the problem scales as the horizon increases. Here the natural measure of sample complexity is a normalized one: we are interested in the number of episodes it takes to provably discover a policy whose value is ε near to that of the optimal value, where the value is measured by the normalized cumulative reward in each episode. In a COLT 2018 open problem, Jiang and Agarwal conjectured that, for tabular, episodic reinforcement learning problems, there exists a sample complexity lower bound which exhibits a polynomial dependence on the horizon -- a conjecture which is consistent with all known sample complexity upper bounds. This work refutes this conjecture, proving that tabular, episodic reinforcement learning is possible with a sample complexity that scales only logarithmically with the planning horizon. In other words, when the values are appropriately normalized (to lie in the unit interval), this results shows that long horizon RL is no more difficult than short horizon RL, at least in a minimax sense. Our analysis introduces two ideas: (i) the construction of an ε-net for optimal policies whose log-covering number scales only logarithmically with the planning horizon, and (ii) the Online Trajectory Synthesis algorithm, which adaptively evaluates all policies in a given policy class using sample complexity that scales with the log-covering number of the given policy class. Both may be of independent interest.
Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning
S. Kakade, P. Liang, V. Sharan, and G. Valiant, Prediction with a Short Memory. STOC: ArXiv Report, 2020. Publisher's VersionAbstract
We consider the problem of predicting the next observation given a sequence of past observations, and consider the extent to which accurate prediction requires complex algorithms that explicitly leverage long-range dependencies. Perhaps surprisingly, our positive results show that for a broad class of sequences, there is an algorithm that predicts well on average, and bases its predictions only on the most recent few observation together with a set of simple summary statistics of the past observations. Specifically, we show that for any distribution over observations, if the mutual information between past observations and future observations is upper bounded by I, then a simple Markov model over the most recent I/ϵ observations obtains expected KL error ϵ---and hence ℓ1 error ϵ√---with respect to the optimal predictor that has access to the entire past and knows the data generating distribution. For a Hidden Markov Model with n hidden states, I is bounded by logn, a quantity that does not depend on the mixing time, and we show that the trivial prediction algorithm based on the empirical frequencies of length O(logn/ϵ) windows of observations achieves this error, provided the length of the sequence is dΩ(logn/ϵ), where d is the size of the observation alphabet. We also establish that this result cannot be improved upon, even for the class of HMMs, in the following two senses: First, for HMMs with n hidden states, a window length of logn/ϵ is information-theoretically necessary to achieve expected ℓ1 error ϵ√. Second, the dΘ(logn/ϵ) samples required to estimate the Markov model for an observation alphabet of size d is necessary for any computationally tractable learning algorithm, assuming the hardness of strongly refuting a certain class of CSPs.
Prediction with a Short Memory
S. Kakade, A. Krishnamurthy, K. Lowrey, M. Ohnishi, and W. Sun, Information Theoretic Regret Bounds for Online Nonlinear Control. NeurIPS: ArXiv Report, 2020. Publisher's VersionAbstract

This work studies the problem of sequential control in an unknown, nonlinear dynamical system, where we model the underlying system dynamics as an unknown function in a known Reproducing Kernel Hilbert Space. This framework yields a general setting that permits discrete and continuous control inputs as well as non-smooth, non-differentiable dynamics. Our main result, the Lower Confidence-based Continuous Control (LC3) algorithm, enjoys a near-optimal O(T‾‾√) regret bound against the optimal controller in episodic settings, where T is the number of episodes. The bound has no explicit dependence on dimension of the system dynamics, which could be infinite, but instead only depends on information theoretic quantities. We empirically show its application to a number of nonlinear control tasks and demonstrate the benefit of exploration for learning model dynamics.

Project website.

Information Theoretic Regret Bounds for Online Nonlinear Control
V. Ramanujan and R. Somani, Soft Threshold Weight Reparameterization for Learnable Sparsity. ICML: ArXiv Report, 2020. Publisher's VersionAbstract
Sparsity in Deep Neural Networks (DNNs) is studied extensively with the focus of maximizing prediction accuracy given an overall parameter budget. Existing methods rely on uniform or heuristic non-uniform sparsity budgets which have sub-optimal layer-wise parameter allocation resulting in a) lower prediction accuracy or b) higher inference cost (FLOPs). This work proposes Soft Threshold Reparameterization (STR), a novel use of the soft-threshold operator on DNN weights. STR smoothly induces sparsity while learning pruning thresholds thereby obtaining a non-uniform sparsity budget. Our method achieves state-of-the-art accuracy for unstructured sparsity in CNNs (ResNet50 and MobileNetV1 on ImageNet-1K), and, additionally, learns non-uniform budgets that empirically reduce the FLOPs by up to 50%. Notably, STR boosts the accuracy over existing results by up to 10% in the ultra sparse (99%) regime and can also be used to induce low-rank (structured sparsity) in RNNs. In short, STR is a simple mechanism which learns effective sparsity budgets that contrast with popular heuristics. Code, pretrained models and sparsity budgets are at this https URL.
Soft Threshold Weight Reparameterization for Learnable Sparsity
W. Kong, R. Somani, Z. Song, S. Kakade, and S. Oh, Robust Meta-learning for Mixed Linear Regression with Small Batches. NeurIPS: ArXiv Report, 2020. Publisher's VersionAbstract
A common challenge faced in practical supervised learning, such as medical image processing and robotic interactions, is that there are plenty of tasks but each task cannot afford to collect enough labeled examples to be learned in isolation. However, by exploiting the similarities across those tasks, one can hope to overcome such data scarcity. Under a canonical scenario where each task is drawn from a mixture of k linear regressions, we study a fundamental question: can abundant small-data tasks compensate for the lack of big-data tasks? Existing second moment based approaches show that such a trade-off is efficiently achievable, with the help of medium-sized tasks with Ω(k1/2) examples each. However, this algorithm is brittle in two important scenarios. The predictions can be arbitrarily bad (i) even with only a few outliers in the dataset; or (ii) even if the medium-sized tasks are slightly smaller with o(k1/2) examples each. We introduce a spectral approach that is simultaneously robust under both scenarios. To this end, we first design a novel outlier-robust principal component analysis algorithm that achieves an optimal accuracy. This is followed by a sum-of-squares algorithm to exploit the information from higher order moments. Together, this approach is robust against outliers and achieves a graceful statistical trade-off; the lack of Ω(k1/2)-size tasks can be compensated for with smaller tasks, which can now be as small as O(logk).
Robust Meta-learning for Mixed Linear Regression with Small Batches
M. Braverman, X. Chen, S. M. Kakade, K. Narasimhan, C. Zhang, and Y. Zhang, Calibration, Entropy Rates, and Memory in Language Models. ICML: ArXiv Report, 2020. Publisher's VersionAbstract
Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. Towards this end, we present a calibration-based approach to measure long-term discrepancies between a generative sequence model and the true distribution, and use these discrepancies to improve the model. Empirically, we show that state-of-the-art language models, including LSTMs and Transformers, are \emph{miscalibrated}: the entropy rates of their generations drift dramatically upward over time. We then provide provable methods to mitigate this phenomenon. Furthermore, we show how this calibration-based approach can also be used to measure the amount of memory that language models use for prediction.
Calibration, Entropy Rates, and Memory in Language Models
J. E. Blumenstock, “Machine learning can help get COVID-19 aid to those who need it most,” Nature, vol. 581, no. 7807, 2020. Publisher's Version
D. Siddarth, et al., Mitigate/Suppress/Maintain: Local Targets for Victory Over COVID. Edmond J. Safra Center for Ethics: Rapid Response Initiative, 2020. Publisher's VersionAbstract
There is growing consensus around a strategy centered on testing, tracing, and supported isolation (TTSI) to suppress COVID, save lives, and safely reopen the economy. Given the high prevalence the disease has reached in many OECD countries, this requires the expansion of TTSI inputs to scales and with a speed that have little precedent (Siddarth and Weyl, 2020). As scarcity of testing supplies is expected to remain during the build-up to a surge, authorities will need to allocate these tests within and across localities to minimize loss of life. This paper documents a preliminary framework that aims to provide such guidance to multiscale geographies, in conjunction with our previous recommendations. Given unfortunate limits in current testing capacity, we describe a testing and tracing regime in which orders of magnitude fewer resources can be used to suppress the disease. Such suppression should be rapidly scaled in low-prevalence areas (within and across regions) to the extent that those areas can be insulated from other areas. In particular, if travel restrictions are used to allow asynchronous sup-pression, and if logistics permit the use of mobile resources, a smaller number of tests per day may be required. At the same time, vulnerable communities and essential workforces must be protected across regions, as prescribed in Phase I of the Roadmap to Pandemic Resilience (Allen et al., 2020).
Mitigate/Suppress/Maintain: Local Targets for Victory Over COVID
D. S. Allen, A. Bassuk, S. Block, G. J. Busenberg, and M. Charpignon, Pandemic Resilience: Getting it Done. Edmond J. Safra Center for Ethics: Rapid Response Initiative, 2020. Publisher's VersionAbstract
On April 27, the CDC [Centers for Disease Control and Prevention] changed its guidance to support broader use of testing not only for therapeutic purposes, but also for disease control. In the most recent guidance, released May 3, first priority goes to hospitalized patients, first responders with symptoms, and residents in congregate living contexts with symptoms. But there is now also a second priority category that includes asymptomatic individuals from groups experiencing disparate impacts of the disease and 'persons without symptoms who are prioritized by health departments or clinicians, for any reason, including but not limited to: public health monitoring, sentinel surveillance, or 'screening of other asymptomatic individuals according to state and local plans' (bold in original, italics added). The last phrase supports broad testing of contacts of COVID [coronavirus disease]-positive individuals and of essential workers, even when they have mild symptoms or none at all. This Supplement to our Roadmap to Pandemic Resilience offers guidance to help state and local governments develop TTSI (testing, tracing, and supported isolation) programs in support of such testing for purposes of disease control and suppression.
Pandemic Resilience: Getting it Done
V. Hart, et al., Outpacing the Virus: Digital Response to Containing the Spread of COVID-19 while Mitigating Privacy Risks. Edmond J. Safra Center for Ethics: Rapid Response Initiative, 2020. Publisher's VersionAbstract
There is a growing consensus that we must use a combined strategy of medical and technological tools to provide us with response at a scale that can outpace the speed and proliferation of the SARS-CoV-2 virus. A process of identifying exposed individuals who have come into contact with diagnosed individuals, called “contact tracing,” has been shown to effectively enable suppression of new cases of SARS-CoV-2 (COVID-19). Important concerns around protecting patient’s confidentiality and civil liberties, and lack of familiarity with available privacy-protecting technologies, have both led to suboptimal privacy implementations and hindered adoption. This paper reviews the trade-offs of these methods, their techniques, the necessary rate of adoption, and critical security and privacy controls and concerns for an information system that can accelerate medical response. Proactive use of intentionally designed technology can support voluntary participation from the public toward the goals of smart testing, effective resource allocation, and relaxing some of physical distancing measures, but only when it guarantees and assures an individual’s complete control over disclosure, and use of data in the way that protects individual rights.
Outpacing the Virus: Digital Response to Containing the Spread of COVID-19 while Mitigating Privacy Risks
J. Chan, et al., “PACT: Privacy Sensitive Protocols and Mechanisms for Mobile Contact Tracing,” IEEE Bulletin on Data Engineering, pp. 15-35, 2020. Publisher's VersionAbstract
The global health threat from COVID-19 has been controlled in a number of instances by large-scale testing and contact tracing efforts. We created this document to suggest three functionalities on how we might best harness computing technologies to supporting the goals of public health organizations in minimizing morbidity and mortality associated with the spread of COVID-19, while protecting the civil liberties of individuals. In particular, this work advocates for a third-party free approach to assisted mobile contact tracing, because such an approach mitigates the security and privacy risks of requiring a trusted third party. We also explicitly consider the inferential risks involved in any contract tracing system, where any alert to a user could itself give rise to de-anonymizing information. More generally, we hope to participate in bringing together colleagues in industry, academia, and civil society to discuss and converge on ideas around a critical issue rising with attempts to mitigate the COVID-19 pandemic.
PACT: Privacy Sensitive Protocols and Mechanisms for Mobile Contact Tracing

Pages