Six-gene prognostic signature for non-alcoholic fatty liver disease susceptibility using machine learning

Medicine (Baltimore). 2024 May 10;103(19):e38076. doi: 10.1097/MD.0000000000038076.

Abstract

Background: nonalcoholic fatty liver disease (NAFLD) is a common liver disease affecting the global population and its impact on human health will continue to increase. Genetic susceptibility is an important factor influencing its onset and progression, and there is a lack of reliable methods to predict the susceptibility of normal populations to NAFLD using appropriate genes.

Methods: RNA sequencing data relating to nonalcoholic fatty liver disease was analyzed using the "limma" package within the R software. Differentially expressed genes were obtained through preliminary intersection screening. Core genes were analyzed and obtained by establishing and comparing 4 machine learning models, then a prediction model for NAFLD was constructed. The effectiveness of the model was then evaluated, and its applicability and reliability verified. Finally, we conducted further gene correlation analysis, analysis of biological function and analysis of immune infiltration.

Results: By comparing 4 machine learning algorithms, we identified SVM as the optimal model, with the first 6 genes (CD247, S100A9, CSF3R, DIP2C, OXCT 2 and PRAMEF16) as predictive genes. The nomogram was found to have good reliability and effectiveness. Six genes' receiver operating characteristic curves (ROC) suggest an essential role in NAFLD pathogenesis, and they exhibit a high predictive value. Further analysis of immunology demonstrated that these 6 genes were closely connected to various immune cells and pathways.

Conclusion: This study has successfully constructed an advanced and reliable prediction model based on 6 diagnostic gene markers to predict the susceptibility of normal populations to NAFLD, while also providing insights for potential targeted therapies.

MeSH terms

  • Calgranulin B / genetics
  • Female
  • Genetic Predisposition to Disease*
  • Humans
  • Machine Learning*
  • Male
  • Nomograms
  • Non-alcoholic Fatty Liver Disease* / diagnosis
  • Non-alcoholic Fatty Liver Disease* / genetics
  • Prognosis
  • ROC Curve
  • Reproducibility of Results

Substances

  • Calgranulin B