Bioinformatics identification of lncRNA biomarkers associated with the progression of esophageal squamous cell carcinoma

Mol Med Rep. 2019 Jun;19(6):5309-5320. doi: 10.3892/mmr.2019.10213. Epub 2019 May 2.

Abstract

The poor outcome of patients with esophageal squamous cell carcinoma (ESCC) highlights the importance of the identification of novel effective prognostic biomarkers. Long non‑coding RNAs (lncRNAs) serve regulatory roles in various types of cancer. The aim of the present study was to investigate the lncRNA expression profile in ESCC and to identify lncRNAs associated with the prognosis of ESCC by performing comprehensive bioinformatics analyses. The RNA‑sequencing (Seq) expression dataset GSE53625 generated from ESCC samples was used as a training dataset. Additional RNA‑Seq datasets relative to ESCC samples were downloaded from The Cancer Genome Atlas and used as a validation dataset. Data were screened using the limma package, and differentially expressed lncRNAs between early‑ and late‑stage ESCC were identified. A random forest algorithm was used to select the optimal lncRNA biomarkers, which were then analyzed using the support vector machine (SVM) algorithm with R software. The identified lncRNA biomarkers were examined in the validation dataset by bidirectional hierarchical clustering and using an SVM classifier. Subsequently, univariate and multivariate Cox regression analyses were performed to analyze the potential ability lncRNAs to predict the survival rate of patients with ESCC. By examining the training group, 259 deregulated lncRNAs between early‑ and advanced‑stage ESCC were identified. Further bioinformatics analyses identified a nine‑lncRNA signature, including AC098973, AL133493, RP11‑51M24, RP11‑317N8, RP11‑834C11, RP11‑69C17, LINC00471, LINC01193 and RP1‑124C. This nine‑lncRNA signature was used to predict the tumor stage and patient survival rate with high reliability and accuracy in the training and validation datasets. Furthermore, these nine lncRNA biomarkers were primarily involved in regulating the cell cycle and DNA replication, and these processes were previously identified to be associated with the progression of ESCC. The identified nine‑lncRNA signature was identified to be associated with the tumor stage, and could be used as predictor of the survival rate of patients with ESCC.

MeSH terms

  • Aged
  • Area Under Curve
  • Biomarkers, Tumor / genetics*
  • Biomarkers, Tumor / metabolism
  • Cluster Analysis
  • Computational Biology / methods*
  • Disease Progression
  • Esophageal Neoplasms / diagnosis*
  • Esophageal Neoplasms / genetics
  • Esophageal Neoplasms / pathology
  • Esophageal Squamous Cell Carcinoma / diagnosis*
  • Esophageal Squamous Cell Carcinoma / genetics
  • Esophageal Squamous Cell Carcinoma / pathology
  • Female
  • Humans
  • Kaplan-Meier Estimate
  • Male
  • Middle Aged
  • Neoplasm Staging
  • Proportional Hazards Models
  • RNA, Long Noncoding / metabolism*
  • ROC Curve
  • Regression Analysis
  • Support Vector Machine

Substances

  • Biomarkers, Tumor
  • RNA, Long Noncoding