Gene Expression and Metadata Based Identification of Key Genes for Hepatocellular Carcinoma Using Machine Learning and Statistical Models

IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3786-3799. doi: 10.1109/TCBB.2023.3322753. Epub 2023 Dec 25.

Abstract

Biomarkers associated with hepatocellular carcinoma (HCC) are of great importance to better understand biological response mechanisms to internal or external intervention. The study aimed to identify key candidate genes for HCC using machine learning (ML) and statistics-based bioinformatics models. Differentially expressed genes (DEGs) were identified using limma and then selected their common genes among DEGs identified from four datasets. After that, protein-protein interaction networks were constructed using STRING and then Cytoscape was used to determine hub genes, significant modules, and their associated genes. Simultaneously, three ML-based techniques such as support vector machine (SVM), least absolute shrinkage and selection operator-logistic regression (LASSO-LR), and partial least squares-discriminant analysis (PLS-DA) were implemented to determine the discriminative genes of HCC from common DEGs. Moreover, metadata of hub genes were formed by listing all hub genes from existing studies to incorporate other findings in our analysis. Finally, seven key candidate genes (ASPM, CCNB1, CDK1, DLGAP5, KIF20 A, MT1X, and TOP2A) were identified by intersecting common genes among hub genes, significant modules genes, discriminative genes from SVM, LASSO-LR, and PLS-DA, and meta hub genes from existing studies. Another three independent test datasets were also used to validate these seven key candidate genes using AUC, computed from ROC.

MeSH terms

  • Carcinoma, Hepatocellular* / genetics
  • Computational Biology / methods
  • Gene Expression
  • Gene Expression Profiling
  • Gene Expression Regulation, Neoplastic
  • Gene Regulatory Networks / genetics
  • Humans
  • Liver Neoplasms* / genetics
  • Metadata
  • Models, Statistical