Causal knowledge graph construction and evaluation for clinical decision support of diabetic nephropathy

Kewei Lyu; Yu Tian; Yong Shang; Tianshu Zhou; Ziyue Yang; Qianghua Liu; Xi Yao; Ping Zhang; Jianghua Chen; Jingsong Li

doi:10.1016/j.jbi.2023.104298

Causal knowledge graph construction and evaluation for clinical decision support of diabetic nephropathy

J Biomed Inform. 2023 Mar:139:104298. doi: 10.1016/j.jbi.2023.104298. Epub 2023 Jan 30.

Authors

Kewei Lyu¹, Yu Tian¹, Yong Shang², Tianshu Zhou², Ziyue Yang¹, Qianghua Liu¹, Xi Yao³, Ping Zhang³, Jianghua Chen³, Jingsong Li⁴

Affiliations

¹ Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China.
² Research Center for Healthcare Data Science, Zhejiang Lab, Hangzhou, China.
³ Kidney Disease Center, the First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China.
⁴ Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China; Research Center for Healthcare Data Science, Zhejiang Lab, Hangzhou, China. Electronic address: ljs@zju.edu.cn.

PMID: 36731730
DOI: 10.1016/j.jbi.2023.104298

Abstract

Background: Many important clinical decisions require causal knowledge (CK) to take action. Although many causal knowledge bases for medicine have been constructed, a comprehensive evaluation based on real-world data and methods for handling potential knowledge noise are still lacking.

Objective: The objectives of our study are threefold: (1) propose a framework for the construction of a large-scale and high-quality causal knowledge graph (CKG); (2) design the methods for knowledge noise reduction to improve the quality of the CKG; (3) evaluate the knowledge completeness and accuracy of the CKG using real-world data.

Material and methods: We extracted causal triples from three knowledge sources (SemMedDB, UpToDate and Churchill's Pocketbook of Differential Diagnosis) based on rule methods and language models, performed ontological encoding, and then designed semantic modeling between electronic health record (EHR) data and the CKG to complete knowledge instantiation. We proposed two graph pruning strategies (co-occurrence ratio and causality ratio) to reduce the potential noise introduced by SemMedDB. Finally, the evaluation was carried out by taking the diagnostic decision support (DDS) of diabetic nephropathy (DN) as a real-world case. The data originated from a Chinese hospital EHR system from October 2010 to October 2020. The knowledge completeness and accuracy of the CKG were evaluated based on three state-of-the-art embedding methods (R-GCN, MHGRN and MedPath), the annotated clinical text and the expert review, respectively.

Results: This graph included 153,289 concepts and 1,719,968 causal triples. A total of 1427 inpatient data were used for evaluation. Better results were achieved by combining three knowledge sources than using only SemMedDB (three models: area under the receiver operating characteristic curve (AUC): p < 0.01, F1: p < 0.01), and the graph covered 93.9 % of the causal relations between diseases and diagnostic evidence recorded in clinical text. Causal relations played a vital role in all relations related to disease progression for DDS of DN (three models: AUC: p > 0.05, F1: p > 0.05), and after pruning, the knowledge accuracy of the CKG was significantly improved (three models: AUC: p < 0.01, F1: p < 0.01; expert review: average accuracy: + 5.5 %).

Conclusions: The results demonstrated that our proposed CKG could completely and accurately capture the abstract CK under the concrete EHR data, and the pruning strategies could improve the knowledge accuracy of our CKG. The CKG has the potential to be applied to the DDS of diseases.

Keywords: Causal knowledge; Diabetic nephropathy; Electronic health record; Knowledge graph.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Decision Support Systems, Clinical*
Diabetes Mellitus*
Diabetic Nephropathies*
Humans
Language
Pattern Recognition, Automated
Semantics