Model and variable selection using machine learning methods with applications to childhood stunting in Bangladesh

Jahidur Rahman Khan; Jabed H Tomal; Enayetur Raheem

doi:10.1080/17538157.2021.1904938

Model and variable selection using machine learning methods with applications to childhood stunting in Bangladesh

Inform Health Soc Care. 2021 Dec 2;46(4):425-442. doi: 10.1080/17538157.2021.1904938. Epub 2021 Apr 14.

Authors

Jahidur Rahman Khan^{1

2}, Jabed H Tomal³, Enayetur Raheem²

Affiliations

¹ Health Research Institute, Faculty of Health, University of Canberra, Canberra, Australia.
² Department of Climate and Envirnoment Health, Biomedical Research Foundation, Dhaka, Bangladesh.
³ Department of Mathematics and Statistics, Thompson Rivers University, Kamloops, British Columbia, Canada.

PMID: 33851897
DOI: 10.1080/17538157.2021.1904938

Abstract

Childhood stunting is a serious public health concern in Bangladesh. Earlier research used conventional statistical methods to identify the risk factors of stunting, and very little is known about the applications and usefulness of machine learning (ML) methods that can identify the risk factors of various health conditions based on complex data. This research evaluates the performance of ML methods in predicting stunting among under-5 aged children using 2014 Bangladesh Demographic and Health Survey data. Besides, this paper identifies variables which are important to predict stunting in Bangladesh. Among the selected ML methods, gradient boosting provides the smallest misclassification error in predicting stunting, followed by random forests, support vector machines, classification tree and logistic regression with forward-stepwise selection. The top 10 important variables (in order of importance) that better predict childhood stunting in Bangladesh are child age, wealth index, maternal education, preceding birth interval, paternal education, division, household size, maternal age at first birth, maternal nutritional status, and parental age. Our study shows that ML can support the building of prediction models and emphasizes on the demographic, socioeconomic, nutritional and environmental factors to understand stunting in Bangladesh.

Keywords: Bangladesh; Machine learning; prediction; stunting; variable importance.

MeSH terms

Aged
Bangladesh / epidemiology
Child
Growth Disorders* / epidemiology
Humans
Infant
Logistic Models
Machine Learning*
Risk Factors