Graph-based relevancy-redundancy gene selection method for cancer diagnosis

Comput Biol Med. 2022 Aug:147:105766. doi: 10.1016/j.compbiomed.2022.105766. Epub 2022 Jun 27.

Abstract

Nowadays, microarray data processing is one of the most important applications in molecular biology for cancer diagnosis. A major task in microarray data processing is gene selection, which aims to find a subset of genes with the least inner similarity and most relevant to the target class. Removing unnecessary, redundant, or noisy data reduces the data dimensionality. This research advocates a graph theoretic-based gene selection method for cancer diagnosis. Both unsupervised and supervised modes use well-known and successful social network approaches such as the maximum weighted clique criterion and edge centrality to rank genes. The suggested technique has two goals: (i) to maximize the relevancy of the chosen genes with the target class and (ii) to reduce their inner redundancy. A maximum weighted clique is chosen in a repetitive way in each iteration of this procedure. The appropriate genes are then chosen from among the existing features in this maximum clique using edge centrality and gene relevance. In the experiment, several datasets consisting of Colon, Leukemia, SRBCT, Prostate Tumor, and Lung Cancer, with different properties, are used to demonstrate the efficacy of the developed model. Our performance is compared to that of renowned filter-based gene selection approaches for cancer diagnosis whose results demonstrate a clear superiority.

Keywords: Cancer diagnosis; Edge centrality; Gene selection; Maximum clique; Social network analysis.

Publication types

  • Clinical Conference

MeSH terms

  • Algorithms*
  • Gene Expression Profiling / methods
  • Humans
  • Neoplasms* / diagnosis
  • Neoplasms* / genetics