Combining disparate data sources for improved poverty prediction and mapping

Proc Natl Acad Sci U S A. 2017 Nov 14;114(46):E9783-E9792. doi: 10.1073/pnas.1700319114. Epub 2017 Oct 31.

Abstract

More than 330 million people are still living in extreme poverty in Africa. Timely, accurate, and spatially fine-grained baseline data are essential to determining policy in favor of reducing poverty. The potential of "Big Data" to estimate socioeconomic factors in Africa has been proven. However, most current studies are limited to using a single data source. We propose a computational framework to accurately predict the Global Multidimensional Poverty Index (MPI) at a finest spatial granularity and coverage of 552 communes in Senegal using environmental data (related to food security, economic activity, and accessibility to facilities) and call data records (capturing individualistic, spatial, and temporal aspects of people). Our framework is based on Gaussian Process regression, a Bayesian learning technique, providing uncertainty associated with predictions. We perform model selection using elastic net regularization to prevent overfitting. Our results empirically prove the superior accuracy when using disparate data (Pearson correlation of 0.91). Our approach is used to accurately predict important dimensions of poverty: health, education, and standard of living (Pearson correlation of 0.84-0.86). All predictions are validated using deprivations calculated from census. Our approach can be used to generate poverty maps frequently, and its diagnostic nature is, likely, to assist policy makers in designing better interventions for poverty eradication.

Keywords: Gaussian process; mobile phone; poverty mapping; remote sensing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Agriculture
  • Bayes Theorem
  • Cell Phone
  • Demography
  • Environment
  • Food Supply
  • Geographic Mapping*
  • Geography
  • Health
  • Humans
  • Income
  • Information Storage and Retrieval*
  • Models, Statistical
  • Models, Theoretical*
  • Normal Distribution
  • Population
  • Poverty* / economics
  • Predictive Value of Tests
  • Remote Sensing Technology
  • Satellite Imagery
  • Senegal
  • Social Planning
  • Social Problems
  • Socioeconomic Factors
  • Urban Population

Associated data

  • figshare/10.6084/m9.figshare.4910099.v1