Predicting residential structures from open source remotely enumerated data using machine learning

PLoS One. 2018 Sep 21;13(9):e0204399. doi: 10.1371/journal.pone.0204399. eCollection 2018.

Abstract

Having accurate maps depicting the locations of residential buildings across a region benefits a range of sectors. This is particularly true for public health programs focused on delivering services at the household level, such as indoor residual spraying with insecticide to help prevent malaria. While open source data from OpenStreetMap (OSM) depicting the locations and shapes of buildings is rapidly improving in terms of quality and completeness globally, even in settings where all buildings have been mapped, information on whether these buildings are residential, commercial or another type is often only available for a small subset. Using OSM building data from Botswana and Swaziland, we identified buildings for which 'type' was indicated, generated via on the ground observations, and classified these into two classes, "sprayable" and "not-sprayable". Ensemble machine learning, using building characteristics such as size, shape and proximity to neighbouring features, was then used to form a model to predict which of these 2 classes every building in these two countries fell into. Results show that an ensemble machine learning approach performed marginally, but statistically, better than the best individual model and that using this ensemble model we were able to correctly classify >86% (using independent test data) of structures correctly as sprayable and not-sprayable across both countries.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Housing / statistics & numerical data*
  • Machine Learning*
  • Models, Statistical

Grants and funding

This work was funded by grants from the Bill and Melinda Gates Foundation, Numbers OPP1132900, OPP1089413, OPP1116450 and OPP1158299 (https://www.gatesfoundation.org) to HJWS, KW, AFB, RAP and AM. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.