Feature selection with ensemble learning for prostate cancer diagnosis from microarray gene expression

Health Informatics J. 2021 Jan-Mar;27(1):1460458221989402. doi: 10.1177/1460458221989402.

Abstract

Cancer diagnosis using machine learning algorithms is one of the main topics of research in computer-based medical science. Prostate cancer is considered one of the reasons that are leading to deaths worldwide. Data analysis of gene expression from microarray using machine learning and soft computing algorithms is a useful tool for detecting prostate cancer in medical diagnosis. Even though traditional machine learning methods have been successfully applied for detecting prostate cancer, the large number of attributes with a small sample size of microarray data is still a challenge that limits their ability for effective medical diagnosis. Selecting a subset of relevant features from all features and choosing an appropriate machine learning method can exploit the information of microarray data to improve the accuracy rate of detection. In this paper, we propose to use a correlation feature selection (CFS) method with random committee (RC) ensemble learning to detect prostate cancer from microarray data of gene expression. A set of experiments are conducted on a public benchmark dataset using 10-fold cross-validation technique to evaluate the proposed approach. The experimental results revealed that the proposed approach attains 95.098% accuracy, which is higher than related work methods on the same dataset.

Keywords: 10-fold cross-validation; ensemble learning; feature selection; machine learning; microarray data; prostate cancer; random committee.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Gene Expression
  • Humans
  • Machine Learning
  • Male
  • Prostatic Neoplasms* / diagnosis
  • Prostatic Neoplasms* / genetics