Interrelated feature selection from health surveys using domain knowledge graph

Health Inf Sci Syst. 2023 Nov 16;11(1):54. doi: 10.1007/s13755-023-00254-7. eCollection 2023 Dec.

Abstract

Finding patterns among risk factors and chronic illness can suggest similar causes, provide guidance to improve healthy lifestyles, and give clues for possible treatments for outliers. Prior studies have typically isolated data challenges from single-disease datasets. However, the predictive power of multiple diseases is more helpful in establishing a healthy lifestyle than investigating one disease. Most studies typically focus on single-disease datasets; however, to ensure that health advice is generalized and contemporary, the features that predict the likelihood of many diseases can improve health advice effectiveness when considering the patient's point of view. We construct and present a novel knowledge-based qualitative method to remove redundant features from a dataset and redefine the outliers. The results of our trials upon five annual chronic disease health surveys demonstrate that our Knowledge Graph-based feature selection, when applied to many machine learning and deep learning multi-label classifiers, can improve classification performance. Our methodology is compatible with future directions, such as graph neural networks. It provides clinicians with an efficient process to select the most relevant health survey questions and responses regarding single or many human organ systems.

Keywords: Chronic illness; Feature selection; Knowledge graphs; Risk factors.