Embedding-based link predictions to explore latent comorbidity of chronic diseases

Health Inf Sci Syst. 2022 Dec 30;11(1):2. doi: 10.1007/s13755-022-00206-7. eCollection 2023 Dec.

Abstract

Purpose: Comorbidity is a term used to describe when a patient simultaneously has more than one chronic disease. Comorbidity is a significant health issue that affects people worldwide. This study aims to use machine learning and graph theory to predict the comorbidity of chronic diseases.

Methods: A patient-disease bipartite graph is constructed based on the administrative claim data. The bipartite graph projection approach was used to create the comorbidity network. For the link prediction task, three graph machine learning embedding-based models (node2vec, graph neural networks and hand-crafted approach) with different variants were used on the comorbidity network to compare their performance. This study also considered three commonly used similarity-based link prediction approaches (Jaccard coefficient, Adamic-Adar index and Resource allocation index) for performance comparison.

Results: The results showed that the embedding-based hand-crafted features technique achieved outstanding performance compared with the remaining similarity-based and embedding-based models. Especially, the hand-crafted technique with the extreme gradient boosting classifier achieved the highest accuracy (91.67%), followed by the same technique with the Logistic regression classifier (90.26%). For this shallow embedding method, the Jaccard coefficient and the degree centrality of the original chronic disease were the most important features for comorbidity prediction.

Conclusion: The proposed framework can be used to predict the comorbidity of chronic disease at an early stage of hospital admission. Thus, the prediction outcome could be valuable for medical practice, giving healthcare providers more control over their services and lowering expenses.