A Gene Selection Method Based on Outliers for Breast Cancer Subtype Classification

IEEE/ACM Trans Comput Biol Bioinform. 2022 Sep-Oct;19(5):2547-2559. doi: 10.1109/TCBB.2021.3132339. Epub 2022 Oct 10.

Abstract

Breast cancer is the second most common cancer type and is the leading cause of cancer-related deaths worldwide. Since it is a heterogeneous disease, subtyping breast cancer plays an important role in performing a specific treatment. Gene expression data is a viable alternative to be employed on cancer subtype classification, as they represent the state of a cell at the molecular level, but generally has a relatively small number of samples compared to a large number of genes. Gene selection is a promising approach that addresses this uneven high-dimensional matrix of genes versus samples and plays an important role in the development of efficient cancer subtype classification. In this work, an innovative outlier-based gene selection (OGS) method is proposed to select relevant genes for efficiently and effectively classify breast cancer subtypes. Experiments show that our strategy presents an F1 score of 1.0 for basal and 0.86 for her 2, the two subtypes with the worst prognoses, respectively. Compared to other methods, our proposed method outperforms in the F1 score using 80% less genes. In general, our method selects only a few highly relevant genes, speeding up the classification, and significantly improving the classifier's performance.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Female
  • Genetic Techniques*
  • Humans
  • Neoplasms*