Identification of cancer types from gene expressions using learning techniques

Comput Methods Biomech Biomed Engin. 2023 Oct-Dec;26(16):1951-1965. doi: 10.1080/10255842.2022.2160243. Epub 2022 Dec 23.

Abstract

Tumor is the major cause of death all around the world in recent days. Early detection and prediction of a cancer type are important for a patient's well-being. Functional genomic data has recently been used in the effective and early detection of cancer. According to previous research, the use of microarray data in cancer prediction has evidenced two main problems as high dimensionality and limited sample size. Several researchers have used numerous statistical and machine learning-based methods to classify cancer types but still, limitations are there which makes cancer classification a difficult job. Deep Learning (DL) and Convolutional Neural Networks (CNN) have been proven with effective analyses of unstructured data including gene expression data. In the proposed method gene expression data for five types of cancer is collected from The Cancer Genome Atlas (TCGA). Prominent features are selected using a hybrid Particle Swarm Optimization (PSO) and Random Forest (RF) algorithm followed by the use of Principal Component Analysis (PCA) for dimensionality reduction. Finally, for classification blend of Convolutional Neural Network (CNN) and Bi-directional Long Short Term Memory (Bi-LSTM) is used to predict the target type of cancer. Experimental results demonstrate that accuracy of the proposed method is 96.89%. As compared to existing work, our method outperformed with better results.

Keywords: Bi-LSTM (Bidirectional-Long Short Term Memory); Cancer prediction; PCA (Principal Component Analysis); Particle Swarm Optimization (PSO); Random Forest (RF); gene expression (GE).

MeSH terms

  • Gene Expression
  • Humans
  • Machine Learning
  • Neoplasms* / diagnosis
  • Neoplasms* / genetics
  • Neural Networks, Computer
  • Principal Component Analysis