Enhancing water access monitoring through mapping multi-source usage and disaggregated geographic inequalities with machine learning and surveys

Sci Rep. 2023 Aug 18;13(1):13433. doi: 10.1038/s41598-023-39917-6.

Abstract

Monitoring safe water access in developing countries relies primarily on household health survey and census data. These surveys are often incomplete: they tend to focus on the primary water source only, are spatially coarse, and usually happen every 5-10 years, during which significant changes can happen in urbanisation and infrastructure provision, especially in sub Saharan Africa. In this work, we present a data-driven approach that utilises and compliments survey based data of water access, to provide context-specific and disaggregated monitoring. The level of access to improved water and sanitation has been shown to vary with geographical inequalities related to the availability of water resources and terrain, population density and socio-economic determinants such as income and education. We use such data and successfully predict the level of water access in areas for which data is lacking, providing spatially explicit and community level monitoring possibilities for mapping geographical inequalities in access. This is showcased by applying three machine learning models that use such geographical data to predict the number of presences of water access points of eight different access types across Uganda, with a 1km by 1km grid resolution. Two Multi-Layer-Perceptron (MLP) models and a Maximum Entropy (MaxEnt) model are developed and compared, where the former are shown to consistently outperform the latter. The best performing Neural Network model achieved a True Positive Rate of 0.89 and a False Positive Rate of 0.24, compared to 0.85 and 0.46 respectively for the MaxEnt model. The models improve on previous work on water point modeling through the use of neural networks, in addition to introducing the True Positive - and False Positive Rate as better evaluation metrics to also assess the MaxEnt model. We also present a scaling method to move from predicting only the relative probability of water point presences, to predicting the absolute number of presences. To challenge both the model results and the more standard health surveys, a new household level survey is carried out in Bushenyi, a mid-sized town in the South-West of Uganda, asking specifically about the multitude of water sources. On average Bushenyi households reported to use 1.9 water sources. The survey further showed that the actual presence of a source, does not always imply that it is used. Therefore it is no option to rely solely on models for water access monitoring. For this, household surveys remain necessary but should be extended with questions on the multiple sources that are used by households.