Development and validation of questionnaire-based machine learning models for predicting all-cause mortality in a representative population of China

Ziyi Li; Na Yang; Liyun He; Jialu Wang; Fan Ping; Wei Li; Lingling Xu; Huabing Zhang; Yuxiu Li

doi:10.3389/fpubh.2023.1033070

Development and validation of questionnaire-based machine learning models for predicting all-cause mortality in a representative population of China

Front Public Health. 2023 Jan 27:11:1033070. doi: 10.3389/fpubh.2023.1033070. eCollection 2023.

Authors

Ziyi Li¹, Na Yang¹, Liyun He¹, Jialu Wang¹, Fan Ping¹, Wei Li¹, Lingling Xu¹, Huabing Zhang¹, Yuxiu Li¹

Affiliation

¹ Key Laboratory of Endocrinology of National Health Commission, Department of Endocrinology, Translation Medicine Center, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.

Abstract

Background: Considering that the previously developed mortality prediction models have limited applications to the Chinese population, a questionnaire-based prediction model is of great importance for its accuracy and convenience in clinical practice.

Methods: Two national cohort, namely, the China Health and Nutrition Survey (8,355 individual older than 18) and the China Health and Retirement Longitudinal Study (12,711 individuals older than 45) were used for model development and validation. One hundred and fifty-nine variables were compiled to generate predictions. The Cox regression model and six machine learning (ML) models were used to predict all-cause mortality. Finally, a simple questionnaire-based ML prediction model was developed using the best algorithm and validated.

Results: In the internal validation set, all the ML models performed better than the traditional Cox model in predicting 6-year mortality and the random survival forest (RSF) model performed best. The questionnaire-based ML model, which only included 20 variables, achieved a C-index of 0.86 (95%CI: 0.80-0.92). On external validation, the simple questionnaire-based model achieved a C-index of 0.82 (95%CI: 0.77-0.87), 0.77 (95%CI: 0.75-0.79), and 0.79 (95%CI: 0.77-0.81), respectively, in predicting 2-, 9-, and 11-year mortality.

Conclusions: In this prospective population-based study, a model based on the RSF analysis performed best among all models. Furthermore, there was no significant difference between the prediction performance of the questionnaire-based ML model, which only included 20 variables, and that of the model with all variables (including laboratory variables). The simple questionnaire-based ML prediction model, which needs to be further explored, is of great importance for its accuracy and suitability to the Chinese general population.

Keywords: machine learning; mortality; personalized prediction; prediction model; questionnaire-based.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

China / epidemiology
Humans
Longitudinal Studies
Machine Learning*
Prognosis
Prospective Studies
Surveys and Questionnaires

Grants and funding

This work was supported by grants from the Beijing Municipal Natural Science Foundation (No. M22014), the National High Level Hospital Clinical Research Funding (2022-PUMCH-B-015), the CAMS Innovation Fund for Medical Sciences (No. 2021-1-I2M-002), the National Natural Science Foundation of China (No. 91846106), and the Non-profit Central Research Institute Fund of the Chinese Academy of Medical Sciences (2019XK320029).