Development of an interpretable machine learning model associated with genetic indicators to identify Yin-deficiency constitution

Chin Med. 2024 May 15;19(1):71. doi: 10.1186/s13020-024-00941-x.

Abstract

Background: Traditional Chinese Medicine (TCM) defines constitutions which are relevant to corresponding diseases among people. As one of the common constitutions, Yin-deficiency constitution influences a number of Chinese population in the disease onset. Therefore, accurate Yin-deficiency constitution identification is significant for disease prevention and treatment.

Methods: In this study, we collected participants with Yin-deficiency constitution and balanced constitution, separately. The least absolute shrinkage and selection operator (LASSO) and logistic regression were used to analyze genetic predictors. Four machine learning models for Yin-deficiency constitution classification with multiple combined genetic indicators were integrated to analyze and identify the optimal model and features. The Shapley Additive exPlanations (SHAP) interpretation was developed for model explanation.

Results: The results showed that, NFKBIA, BCL2A1 and CCL4 were the most associated genetic indicators with Yin-deficiency constitution. Random forest with three genetic predictors including NFKBIA, BCL2A1 and CCL4 was the optimal model, area under curve (AUC): 0.937 (95% CI 0.844-1.000), sensitivity: 0.870, specificity: 0.900. The SHAP method provided an intuitive explanation of risk leading to individual predictions.

Conclusion: We constructed a Yin-deficiency constitution classification model based on machine learning and explained it with the SHAP method, providing an objective Yin-deficiency constitution identification system in TCM and the guidance for clinicians.

Keywords: Yin-deficiency constitution; Constitution identification; Machine learning; Model interpretation; Prediction model; Traditional Chinese medicine.