A novel machine learning derived RNA-binding protein gene-based score system predicts prognosis of hepatocellular carcinoma patients

PeerJ. 2021 Dec 20:9:e12572. doi: 10.7717/peerj.12572. eCollection 2021.

Abstract

Background: Although the expression of RNA-binding protein (RBP) genes in hepatocellular carcinoma (HCC) varies and is associated with tumor progression, there has been no overview study with multiple cohorts and large samples. The HCC-associated RBP genes need to be more accurately identified, and their clinical application value needs to be further explored.

Methods: First, we used the robust rank aggregation (RRA) algorithm to extract HCC-associated RBP genes from nine HCC microarray datasets and verified them in The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC) cohort and International Cancer Genome Consortium (ICGC) Japanese liver cancer (ICGC-LIRI-JP) cohort. In addition, the copy number variation (CNV), single-nucleotide variant (SNV), and promoter-region methylation data of HCC-associated RBP genes were analyzed. Using the random forest algorithm, we constructed an RBP gene-based prognostic score system (RBP-score). We then evaluated the ability of RBP-score to predict the prognosis of patients. The relationships between RBP-score and other clinical characteristics of patients were analyzed.

Results: The RRA algorithm identified 30 RBP mRNAs with consistent expression patterns across the nine HCC microarray datasets. These 30 RBP genes were defined as HCC-associated RBP genes. Their mRNA expression patterns were further verified in the TCGA-LIHC and ICGC-LIRI-JP cohorts. Among these 30 RBP genes, some showed significant copy number gain or loss, while others showed differences in the methylation levels of their promoter regions. Some RBP genes were risk factors or protective factors for the prognosis of patients. We extracted 10 key HCC-associated RBP genes using the random forest algorithm and constructed an RBP-score system. RBP-score effectively predicted the overall survival (OS) and disease-free survival (DFS) of HCC patients and was associated with the tumor, node, metastasis (TNM) stage, α-fetoprotein (AFP), and metastasis risk. The clinical value of RBP-score was validated in datasets from different platforms. Cox analysis suggested that a high RBP-score was an independent risk factor for poor prognosis in HCC patients. We also successfully established a combined RBP-score+TNM LASSO-Cox model that more accurately predicted the prognosis.

Conclusion: The RBP-score system constructed based on HCC-associated RBP genes is a simple and highly effective prognostic evaluation tool. It is suitable for different subgroups of HCC patients and has cross-platform characteristics. Combining RBP-score with the TNM staging system or other clinical parameters can lead to an even greater clinical benefit. In addition, the identified HCC-associated RBP genes may serve as novel targets for HCC treatment.

Keywords: Hepatocellular carcinoma; Prognosis; RNA-binding protein; Random forest algorithm; Survival.

Grants and funding

This study was supported by the Science and Technology Innovation Commission of Shenzhen (KQJSCX20180321164801762) and the Shenzhen Science and the Technology Project (JCYJ20180305164841126). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.