WGCNA combined with machine learning to find potential biomarkers of liver cancer

Medicine (Baltimore). 2023 Dec 15;102(50):e36536. doi: 10.1097/MD.0000000000036536.

Abstract

The incidence of hepatocellular carcinoma (HCC) has been increasing in recent years. With the development of various detection technologies, machine learning is an effective method to screen disease characteristic genes. In this study, weighted gene co-expression network analysis (WGCNA) and machine learning are combined to find potential biomarkers of liver cancer, which provides a new idea for future prediction, prevention, and personalized treatment. In this study, the "limma" software package was used. P < .05 and log2 |fold-change| > 1 is the standard screening differential genes, and then the module genes obtained by WGCNA analysis are crossed to obtain the key module genes. Gene Ontology and Kyoto Gene and Genome Encyclopedia analysis was performed on key module genes, and 3 machine learning methods including lasso, support vector machine-recursive feature elimination, and RandomForest were used to screen feature genes. Finally, the validation set was used to verify the feature genes, the GeneMANIA (http://www.genemania.org) database was used to perform protein-protein interaction networks analysis on the feature genes, and the SPIED3 database was used to find potential small molecule drugs. In this study, 187 genes associated with HCC were screened by using the "limma" software package and WGCNA. After that, 6 feature genes (AADAT, APOF, GPC3, LPA, MASP1, and NAT2) were selected by RandomForest, Absolute Shrinkage and Selection Operator, and support vector machine-recursive feature elimination machine learning algorithms. These genes are also significantly different on the external dataset and follow the same trend as the training set. Finally, our findings may provide new insights into targets for diagnosis, prevention, and treatment of HCC. AADAT, APOF, GPC3, LPA, MASP1, and NAT2 may be potential genes for the prediction, prevention, and treatment of liver cancer in the future.

MeSH terms

  • Algorithms
  • Arylamine N-Acetyltransferase*
  • Biomarkers
  • Carcinoma, Hepatocellular* / diagnosis
  • Carcinoma, Hepatocellular* / genetics
  • Glypicans
  • Humans
  • Liver Neoplasms* / diagnosis
  • Liver Neoplasms* / genetics
  • Machine Learning

Substances

  • Biomarkers
  • NAT2 protein, human
  • Arylamine N-Acetyltransferase
  • GPC3 protein, human
  • Glypicans