Nonlaboratory-based risk assessment model for coronary heart disease screening: Model development and validation

Int J Med Inform. 2022 Mar 18:162:104746. doi: 10.1016/j.ijmedinf.2022.104746. Online ahead of print.

Abstract

Background: Identifying groups at high risk of coronary heart disease (CHD) is important to reduce mortality due to CHD. Although machine learning methods have been introduced, many require laboratory or imaging parameters, which are not always readily available; thus, their wide applications are limited.

Objective: The aim of this study was to develop and validate a simple, efficient, and joint machine learning model for identifying individuals at high risk of CHD using easily obtainable nonlaboratory parameters.

Methods: This prospective study used data from the Henan Rural Cohort Study, which was conducted in rural areas of Henan Province, China, between July 2015 and September 2017. A joint machine learning model was developed by selecting and combining four base machine learning algorithms, including logistic regression (LR), artificial neural network (ANN), random forest (RF), and gradient boosting machine (GBM). We used readily accessible variables, including demographics, medical and family history, lifestyle and dietary factors, and anthropometric data, to inform the model. The model was also externally validated by a cohort of individuals from the Dongfeng-Tongji cohort study. Model discrimination was assessed by using the area under the receiver operating characteristic curve (AUC), and calibration was measured by using the Brier score (BS).

Results: A total of 38 716 participants (mean [SD] age, 55.64[12.19] years; 23449[60.6%] female) from the Henan Rural Cohort Study and 17 958 subjects (mean [SD] age, 62.74 [7.59] years; 10,076 [56.1%] female) from the Dongfeng-Tongji cohort study were included in the analysis. Age, waist circumference, pulse pressure, heart rate, family history of CHD, education level, family history of type 2 diabetes mellitus (T2DM), and family history of dyslipidaemia were strongly associated with the development of CHD. In regard to internal validation, the model we built demonstrated good discrimination (AUC, 0.844 (95% CI 0.828-0.860)) and had acceptable calibration (BS, 0. 066). In regard to external validation, the model performed well with clearly useful discrimination (AUC, 0.792 (95% CI 0.774-0.810)) and robust calibration (BS, 0.069).

Conclusions: In this study, the novel and simple, machine learning-based model comprising readily accessible variables accurately identified individuals at high risk of CHD. This model has the potential to be widely applied for large-scale screening of CHD populations, especially in medical resource-constrained settings.

Trial registration: The Henan Rural Cohort Study has been registered at the Chinese Clinical Trial Register. (Trial registration: ChiCTR-OOC-15006699. Registered 6 July 2015 - Retrospectively registered) http://www.chictr.org.cn/showproj.aspx?proj=11375.

Keywords: Coronary heart disease; Machine learning; Nonlaboratory parameters; Risk assessment; Screening.