Ml-social - Publications and Preprints

2021
E. Aiken, G. Bedoya, A. Coville, and J. E. Blumenstock, “Targeting Development Aid with Machine Learning and Mobile Phone Data: Evidence from an Anti-Poverty Intervention in Afghanistan,” ACM SIGCAS Computing and Sustainable Societies (COMPASS '20), 2021. Publisher's VersionAbstract
Can mobile phone data improve program targeting? By combining rich survey data from a “big push” anti-poverty program in Afghanistan with detailed mobile phone logs from program beneficiaries, we study the extent to which machine learning methods can accurately differentiate ultra-poor households eligible for program benefits from ineligible households. We show that supervised learning methods leveraging mobile phone data can identify ultra-poor households nearly as accurately as survey-based measures of consumption and wealth; and that combining survey-based measures with mobile phone data produces classifications more accurate than those based on a single data source.
Targeting Development Aid with Machine Learning and Mobile Phone Data: Evidence from an Anti-Poverty Intervention in Afghanistan
2020
S. Kakade, P. Liang, V. Sharan, and G. Valiant, Prediction with a Short Memory. STOC: ArXiv Report, 2020. Publisher's VersionAbstract
We consider the problem of predicting the next observation given a sequence of past observations, and consider the extent to which accurate prediction requires complex algorithms that explicitly leverage long-range dependencies. Perhaps surprisingly, our positive results show that for a broad class of sequences, there is an algorithm that predicts well on average, and bases its predictions only on the most recent few observation together with a set of simple summary statistics of the past observations. Specifically, we show that for any distribution over observations, if the mutual information between past observations and future observations is upper bounded by I, then a simple Markov model over the most recent I/ϵ observations obtains expected KL error ϵ---and hence ℓ1 error ϵ√---with respect to the optimal predictor that has access to the entire past and knows the data generating distribution. For a Hidden Markov Model with n hidden states, I is bounded by logn, a quantity that does not depend on the mixing time, and we show that the trivial prediction algorithm based on the empirical frequencies of length O(logn/ϵ) windows of observations achieves this error, provided the length of the sequence is dΩ(logn/ϵ), where d is the size of the observation alphabet. We also establish that this result cannot be improved upon, even for the class of HMMs, in the following two senses: First, for HMMs with n hidden states, a window length of logn/ϵ is information-theoretically necessary to achieve expected ℓ1 error ϵ√. Second, the dΘ(logn/ϵ) samples required to estimate the Markov model for an observation alphabet of size d is necessary for any computationally tractable learning algorithm, assuming the hardness of strongly refuting a certain class of CSPs.
Prediction with a Short Memory
W. Kong, R. Somani, Z. Song, S. Kakade, and S. Oh, Robust Meta-learning for Mixed Linear Regression with Small Batches. NeurIPS: ArXiv Report, 2020. Publisher's VersionAbstract
A common challenge faced in practical supervised learning, such as medical image processing and robotic interactions, is that there are plenty of tasks but each task cannot afford to collect enough labeled examples to be learned in isolation. However, by exploiting the similarities across those tasks, one can hope to overcome such data scarcity. Under a canonical scenario where each task is drawn from a mixture of k linear regressions, we study a fundamental question: can abundant small-data tasks compensate for the lack of big-data tasks? Existing second moment based approaches show that such a trade-off is efficiently achievable, with the help of medium-sized tasks with Ω(k1/2) examples each. However, this algorithm is brittle in two important scenarios. The predictions can be arbitrarily bad (i) even with only a few outliers in the dataset; or (ii) even if the medium-sized tasks are slightly smaller with o(k1/2) examples each. We introduce a spectral approach that is simultaneously robust under both scenarios. To this end, we first design a novel outlier-robust principal component analysis algorithm that achieves an optimal accuracy. This is followed by a sum-of-squares algorithm to exploit the information from higher order moments. Together, this approach is robust against outliers and achieves a graceful statistical trade-off; the lack of Ω(k1/2)-size tasks can be compensated for with smaller tasks, which can now be as small as O(logk).
Robust Meta-learning for Mixed Linear Regression with Small Batches
W. Kong, R. Somani, Z. Song, S. Kakade, and S. Oh, Meta-learning for mixed linear regression. ICML: ArXiv Report, 2020. Publisher's VersionAbstract
In modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labeled data. These include data from medical image processing and robotic interaction. Even though each individual task cannot be meaningfully trained in isolation, one seeks to meta-learn across the tasks from past experiences by exploiting some similarities. We study a fundamental question of interest: When can abundant tasks with small data compensate for lack of tasks with big data? We focus on a canonical scenario where each task is drawn from a mixture of k linear regressions, and identify sufficient conditions for such a graceful exchange to hold; The total number of examples necessary with only small data tasks scales similarly as when big data tasks are available. To this end, we introduce a novel spectral approach and show that we can efficiently utilize small data tasks with the help of Ω̃ (k3/2) medium data tasks each with Ω̃ (k1/2) examples.
Meta-learning for mixed linear regression
2019
G. Cadamuro, R. K. Vinayak, J. Blumenstock, S. Kakade, and J. N. Shapiro, “The Illusion of Change: Correcting for Bias when Inferring Changes in Sparse, Societal-Scale Data,” The Web Conference 2019, pp. 2608–2615, 2019. Publisher's VersionAbstract
Societal-scale data is playing an increasingly prominent role in social science research; examples from research on geopolitical events include questions on how emergency events impact the diffusion of information or how new policies change patterns of social interaction. Such research often draws critical inferences from observing how an exogenous event changes meaningful metrics like network degree or network entropy. However, as we show in this work, standard estimation methodologies make systematically incorrect inferences when the event also changes the sparsity of the data. To address this issue, we provide a general framework for inferring changes in social metrics when dealing with non-stationary sparsity. We propose a plug-in correction that can be applied to any estimator, including several recently proposed procedures. Using both simulated and real data, we demonstrate that the correction significantly improves the accuracy of the estimated change under a variety of plausible data generating processes. In particular, using a large dataset of calls from Afghanistan, we show that whereas traditional methods substantially overestimate the impact of a violent event on social diversity, the plug-in correction reveals the true response to be much more modest.
The Illusion of Change: Correcting for Bias when Inferring Changes in Sparse, Societal-Scale Data
R. K. Vinayak, W. Kong, G. Valiant, and S. M. Kakade, Maximum Likelihood Estimation for Learning Populations of Parameters. ArXiv Report, 2019. Publisher's VersionAbstract
Consider a setting with N independent individuals, each with an unknown parameter, pi∈[0,1] drawn from some unknown distribution P⋆. After observing the outcomes of t independent Bernoulli trials, i.e., Xi∼Binomial(t,pi) per individual, our objective is to accurately estimate P⋆. This problem arises in numerous domains, including the social sciences, psychology, health-care, and biology, where the size of the population under study is usually large while the number of observations per individual is often limited. Our main result shows that, in the regime where t≪N, the maximum likelihood estimator (MLE) is both statistically minimax optimal and efficiently computable. Precisely, for sufficiently large N, the MLE achieves the information theoretic optimal error bound of (1t) for t
Maximum Likelihood Estimation for Learning Populations of Parameters
2018
Q. Huang, S. Kakade, W. Kong, and G. Valian, Recovering Structured Probability Matrices. ITCS: ArXiv Report, 2018. Publisher's VersionAbstract
We consider the problem of accurately recovering a matrix B of size M by M , which represents a probability distribution over M2 outcomes, given access to an observed matrix of "counts" generated by taking independent samples from the distribution B. How can structural properties of the underlying matrix B be leveraged to yield computationally efficient and information theoretically optimal reconstruction algorithms? When can accurate reconstruction be accomplished in the sparse data regime? This basic problem lies at the core of a number of questions that are currently being considered by different communities, including building recommendation systems and collaborative filtering in the sparse data regime, community detection in sparse random graphs, learning structured models such as topic models or hidden Markov models, and the efforts from the natural language processing community to compute "word embeddings". Our results apply to the setting where B has a low rank structure. For this setting, we propose an efficient algorithm that accurately recovers the underlying M by M matrix using Theta(M) samples. This result easily translates to Theta(M) sample algorithms for learning topic models and learning hidden Markov Models. These linear sample complexities are optimal, up to constant factors, in an extremely strong sense: even testing basic properties of the underlying matrix (such as whether it has rank 1 or 2) requires Omega(M) samples. We provide an even stronger lower bound where distinguishing whether a sequence of observations were drawn from the uniform distribution over M observations versus being generated by an HMM with two hidden states requires Omega(M) observations. This precludes sublinear-sample hypothesis tests for basic properties, such as identity or uniformity, as well as sublinear sample estimators for quantities such as the entropy rate of HMMs.
Recovering Structured Probability Matrices
2017
M. R. Khan and J. Blumenstock, Determinants of Mobile Money Adoption in Pakistan. 2017. Publisher's VersionAbstract
In this work, we analyze the problem of adoption of mobile money in Pakistan by using the call detail records of a major telecom company as our input. Our results highlight the fact that different sections of the society have different patterns of adoption of digital financial services but user mobility related features are the most important one when it comes to adopting and using mobile money services.
Determinants of Mobile Money Adoption in Pakistan
2016
M. R. Khan and J. Blumenstock, “Predictors without Borders: Behavioral Modeling of Product Adoption in Three Developing Countries,” KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 145–154, 2016. Publisher's VersionAbstract
Billions of people around the world live without access to banks or other formal financial institutions. In the past several years, many mobile operators have launched "Mobile Money" platforms that deliver basic financial services over the mobile phone network. While many believe that these services can improve the lives of the poor, in many countries adoption of Mobile Money still remains anemic. In this paper, we develop a predictive model of Mobile Money adoption that uses billions of mobile phone communications records to understand the behavioral determinants of adoption. We describe a novel approach to feature engineering that uses a Deterministic Finite Automaton to construct thousands of behavioral metrics of phone use from a concise set of recursive rules. These features provide the foundation for a predictive model that is tested on mobile phone operators logs from Ghana, Pakistan, and Zambia, three very different developing-country contexts. The results highlight the key correlates of Mobile Money use in each country, as well as the potential for such methods to predict and drive adoption. More generally, our analysis provides insight into the extent to which homogenized supervised learning methods can generalize across geographic contexts. We find that without careful tuning, a model that performs very well in one country frequently does not generalize to another.
Predictors without Borders: Behavioral Modeling of Product Adoption in Three Developing Countries