A novel dynamic Bayesian network approach for data mining and survival data analysis

BMC Med Inform Decis Mak. 2022 Sep 22;22(1):251. doi: 10.1186/s12911-022-02000-7.

Abstract

Background: Censorship is the primary challenge in survival modeling, especially in human health studies. The classical methods have been limited by applications like Kaplan-Meier or restricted assumptions like the Cox regression model. On the other hand, Machine learning algorithms commonly rely on the high dimensionality of data and ignore the censorship attribute. In addition, these algorithms are more sophisticated to understand and utilize. We propose a novel approach based on the Bayesian network to address these issues.

Methods: We proposed a two-slice temporal Bayesian network model for the survival data, introducing the survival and censorship status in each observed time as the dynamic states. A score-based algorithm learned the structure of the directed acyclic graph. The likelihood approach conducted parameter learning. We conducted a simulation study to assess the performance of our model in comparison with the Kaplan-Meier and Cox proportional hazard regression. We defined various scenarios according to the sample size, censoring rate, and shapes of survival and censoring distributions across time. Finally, we fit the model on a real-world dataset that includes 760 post gastrectomy surgery due to gastric cancer. The validation of the model was explored using the hold-out technique based on the posterior classification error. Our survival model performance results were compared using the Kaplan-Meier and Cox proportional hazard models.

Results: The simulation study shows the superiority of DBN in bias reduction for many scenarios compared with Cox regression and Kaplan-Meier, especially in the late survival times. In the real-world data, the structure of the dynamic Bayesian network model satisfied the finding from Kaplan-Meier and Cox regression classical approaches. The posterior classification error found from the validation technique did not exceed 0.04, representing that our network predicted the state variables with more than 96% accuracy.

Conclusions: Our proposed dynamic Bayesian network model could be used as a data mining technique in the context of survival data analysis. The advantages of this approach are feature selection ability, straightforward interpretation, handling of high-dimensional data, and few assumptions.

Keywords: Directed acyclic graph; Dynamic Bayesian network; Gastric cancer; Survival analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Data Analysis*
  • Data Mining*
  • Humans
  • Likelihood Functions
  • Proportional Hazards Models
  • Survival Analysis