Convolutional neural network for biomarker discovery for triple negative breast cancer with RNA sequencing data

Heliyon. 2023 Mar 23;9(4):e14819. doi: 10.1016/j.heliyon.2023.e14819. eCollection 2023 Apr.

Abstract

Triple negative breast cancers (TNBCs) are tumors with a poor treatment response and prognosis. In this study, we propose a new approach, candidate extraction from convolutional neural network (CNN) elements (CECE), for discovery of biomarkers for TNBCs. We used the GSE96058 and GSE81538 datasets to build a CNN model to classify TNBCs and non-TNBCs and used the model to make TNBC predictions for two additional datasets, the cancer genome atlas (TCGA) breast cancer RNA sequencing data and the data from Fudan University Shanghai Cancer Center (FUSCC). Using correctly predicted TNBCs from the GSE96058 and TCGA datasets, we calculated saliency maps for these subjects and extracted the genes that the CNN model used to separate TNBCs from non-TNBCs. Among the TNBC signature patterns that the CNN models learned from the training data, we found a set of 21 genes that can classify TNBCs into two major classes, or CECE subtypes, with distinct overall survival rates (P = 0.0074). We replicated this subtype classification in the FUSCC dataset using the same 21 genes, and the two subtypes had similar differential overall survival rates (P = 0.0490). When all TNBCs were combined from the 3 datasets, the CECE II subtype had a hazard ratio of 1.94 (95% CI, 1.25-3.01; P = 0.0032). The results demonstrate that the spatial patterns learned by the CNN models can be utilized to discover interacting biomarkers otherwise unlikely to be identified by traditional approaches.

Keywords: Biomarker discovery; Convolutional neural network; Machine learning; RNA sequencing; Triple negative breast cancer.