Learning Bayesian networks from demographic and health survey data

J Biomed Inform. 2021 Jan:113:103588. doi: 10.1016/j.jbi.2020.103588. Epub 2020 Nov 17.

Abstract

Child mortality from preventable diseases such as pneumonia and diarrhoea in low and middle-income countries remains a serious global challenge. We combine knowledge with available Demographic and Health Survey (DHS) data from India, to construct Causal Bayesian Networks (CBNs) and investigate the factors associated with childhood diarrhoea. We make use of freeware tools to learn the graphical structure of the DHS data with score-based, constraint-based, and hybrid structure learning algorithms. We investigate the effect of missing values, sample size, and knowledge-based constraints on each of the structure learning algorithms and assess their accuracy with multiple scoring functions. Weaknesses in the survey methodology and data available, as well as the variability in the CBNs generated by the different algorithms, mean that it is not possible to learn a definitive CBN from data. However, knowledge-based constraints are found to be useful in reducing the variation in the graphs produced by the different algorithms, and produce graphs which are more reflective of the likely influential relationships in the data. Furthermore, valuable insights are gained into the performance and characteristics of the structure learning algorithms. Two score-based algorithms in particular, TABU and FGES, demonstrate many desirable qualities; (a) with sufficient data, they produce a graph which is similar to the reference graph, (b) they are relatively insensitive to missing values, and (c) behave well with knowledge-based constraints. The results provide a basis for further investigation of the DHS data and for a deeper understanding of the behaviour of the structure learning algorithms when applied to real-world settings.

Keywords: Directed acyclic graph; Graphical models; Health informatics; Structure learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Bayes Theorem
  • Child
  • Demography
  • Humans
  • Knowledge Bases*
  • Sample Size