Screening key lncRNAs for human lung adenocarcinoma based on machine learning and weighted gene co-expression network analysis

Cancer Biomark. 2019;25(4):313-324. doi: 10.3233/CBM-190225.

Abstract

Background: Lung adenocarcinoma (LUAD) accounts for a significant proportion of lung cancer and there have been few diagnostic and therapeutic targets for LUAD due to the lack of specific biomarker. The aim of this study was to identify key long non-coding RNAs (lncRNAs) for LUAD.

Methods: The lncRNA and mRNA expression profiles of a large group of patients with LUAD were obtained from The Cancer Genome Atlas (TCGA). The differentially expressed lncRNAs (DElncRNAs) and mRNAs (DEmRNAs) were identified. The optimal diagnostic lncRNA biomarkers for LUAD were identified by using feature selection procedure and classification model. We established classification models including random forests, decision tree and support vector machine to distinguish LUAD and normal tissues. The lncRNAs-mRNAs co-expression networks and module identification were established by weighted gene co-expression network analysis (WGCNA). Functional annotation of pink and green modules was performed. The expression of selected DElncRNAs were validated by qRT-PCR.

Results: A total of 1364 DEmRNAs (468 down-regulated and 896 up-regulated mRNAs) and 260 DElncRNAs (88 down-regulated and 172 up-regulated lncRNAs) between LUAD and normal tissue were obtained. LANCL1-AS1, MIR3945HG, LINC01270, RP5-1061H20.4, BLACAT1, LINC01703, CTD-2227E11.1 and RP1-244F24.1 were identified as optimal diagnostic lncRNA biomarkers for LUAD. The area under curve (AUC) of the random forests model, decision tree model and SVM model were 0.999, 0.937 and 0.999, and the specificity and sensitivity of the three model were 98.3% and 99.8%, 93.2% and 99% and 100% and 98.4%, respectively. Co-expression networks analysis showed that RP11-389C8.2, CTD-2510F5.4 and TMPO-AS1 were co-expressed with 44, 242 and 241 mRNAs, respectively. Cell cycle, DNA replication and p53 signaling pathway were three significantly enriched pathways. The qRT-PCR results were consistent with our integrated analysis, generally. The GSE32863 and GSE104854 validation was consistent with our integrated analysis, generally.

Conclusion: Our study identified eight DElncRNAs as potential diagnostic biomarkers of LUAD. Functional annotation of green module provided new evidences for exploring the precise roles of lncRNA in LUAD.

Keywords: Lung adenocarcinoma; long non-coding RNAs; machine learning; weighted gene co-expression network analysis.

MeSH terms

  • Adenocarcinoma of Lung / genetics*
  • Female
  • Gene Expression Profiling / methods*
  • Gene Regulatory Networks / genetics*
  • Humans
  • Machine Learning / standards*
  • Male
  • RNA, Long Noncoding / metabolism*

Substances

  • RNA, Long Noncoding