Machine learning predicts cancer-associated deep vein thrombosis using clinically available variables

Shuai Jin; Dan Qin; Bao-Sheng Liang; Li-Chuan Zhang; Xiao-Xia Wei; Yu-Jie Wang; Bing Zhuang; Tong Zhang; Zhen-Peng Yang; Yi-Wei Cao; San-Li Jin; Ping Yang; Bo Jiang; Ben-Qiang Rao; Han-Ping Shi; Qian Lu

doi:10.1016/j.ijmedinf.2022.104733

Machine learning predicts cancer-associated deep vein thrombosis using clinically available variables

Int J Med Inform. 2022 May:161:104733. doi: 10.1016/j.ijmedinf.2022.104733. Epub 2022 Mar 5.

Authors

Shuai Jin¹, Dan Qin¹, Bao-Sheng Liang², Li-Chuan Zhang¹, Xiao-Xia Wei¹, Yu-Jie Wang¹, Bing Zhuang¹, Tong Zhang¹, Zhen-Peng Yang³, Yi-Wei Cao¹, San-Li Jin¹, Ping Yang¹, Bo Jiang⁴, Ben-Qiang Rao³, Han-Ping Shi³, Qian Lu⁵

Affiliations

¹ Division of Medical & Surgical Nursing, School of Nursing, Peking University, Beijing, China.
² Department of Biostatistics, School of Public Health, Peking University, Beijing, China. Electronic address: liangbs@hsc.pku.edu.cn.
³ Department of Gastrointestinal Surgery, Beijing Shijitan Hospital, Capital Medical University/The 9th Clinical Medical College, Peking University, Beijing, China.
⁴ Department of Medical Oncology, Beijing Shijitan Hospital, Capital Medical University/The 9th Clinical Medical College, Peking University, Beijing, China.
⁵ Division of Medical & Surgical Nursing, School of Nursing, Peking University, Beijing, China. Electronic address: luqian@bjmu.edu.cn.

PMID: 35299099
DOI: 10.1016/j.ijmedinf.2022.104733

Abstract

Purpose: To develop and validate machine learning (ML) models for cancer-associated deep vein thrombosis (DVT) and to compare the performance of these models with the Khorana score (KS).

Methods: We randomly extracted data of 2100 patients with cancer between Jan. 1, 2017, and Oct. 31, 2019, and 1035 patients who underwent Doppler ultrasonography were enrolled. Univariate analysis and Lasso regression were applied to select important predictors. Model training and hyperparameter tuning were implemented on 70% of the data using a ten-fold cross-validation method. The remaining 30% of the data were used to compare the performance with seven indicators (area under the receiver operating characteristic curve [AUC], sensitivity, specificity, accuracy, balanced accuracy, Brier score, and calibration curve), among all five ML models (linear discriminant analysis [LDA], logistic regression [LR], classification tree [CT], random forest [RF], and support vector machine [SVM]), and the KS.

Results: The incidence of cancer-associated DVT was 22.3%. The top five predictors were D-dimer level, age, Charlson Comorbidity Index (CCI), length of stay (LOS), and previous VTE (venous thromboembolism) history according to RF. Only LDA (AUC = 0.773) and LR (AUC = 0.772) outperformed KS (AUC = 0.642), and combination with D-dimer showed improved performance in all models. A nomogram and web calculator https://webcalculatorofcancerassociateddvt.shinyapps.io/dynnomapp/ were used to visualize the best recommended LR model.

Conclusion: This study developed and validated cancer-associated DVT predictive models using five ML algorithms and visualized the best recommended model using a nomogram and web calculator. The nomogram and web calculator developed in this study may assist doctors and nurses in evaluating individualized cancer-associated DVT risk and making decisions. However, other prospective cohort studies should be conducted to externally validate the recommended model.

Keywords: Decision making; Deep vein thrombosis; Machine learning; Neoplasms; Risk stratification.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Humans
Logistic Models
Machine Learning
Neoplasms* / complications
Neoplasms* / epidemiology
Prospective Studies
Venous Thrombosis* / diagnosis
Venous Thrombosis* / epidemiology
Venous Thrombosis* / etiology