Prediction of Fatty Liver Disease in a Chinese Population Using Machine-Learning Algorithms

Diagnostics (Basel). 2023 Mar 18;13(6):1168. doi: 10.3390/diagnostics13061168.

Abstract

Background: Fatty liver disease (FLD) is an important risk factor for liver cancer and cardiovascular disease and can lead to significant social and economic burden. However, there is currently no nationwide epidemiological survey for FLD in China, making early FLD screening crucial for the Chinese population. Unfortunately, liver biopsy and abdominal ultrasound, the preferred methods for FLD diagnosis, are not practical for primary medical institutions. Therefore, the aim of this study was to develop machine learning (ML) models for screening individuals at high risk of FLD, and to provide a new perspective on early FLD diagnosis.

Methods: This study included a total of 30,574 individuals between the ages of 18 and 70 who completed abdominal ultrasound and the related clinical examinations. Among them, 3474 individuals were diagnosed with FLD by abdominal ultrasound. We used 11 indicators to build eight classification models to predict FLD. The model prediction ability was evaluated by the area under the curve, sensitivity, specificity, positive predictive value, negative predictive value, and kappa value. Feature importance analysis was assessed by Shapley value or root mean square error loss after permutations.

Results: Among the eight ML models, the prediction accuracy of the extreme gradient boosting (XGBoost) model was highest at 89.77%. By feature importance analysis, we found that the body mass index, triglyceride, and alanine aminotransferase play important roles in FLD prediction.

Conclusion: XGBoost improves the efficiency and cost of large-scale FLD screening.

Keywords: XGBoost; early screening; fatty liver disease; machine learning.