Gene expression cancer classification using modified K-Nearest Neighbors technique

Biosystems. 2019 Feb:176:41-51. doi: 10.1016/j.biosystems.2018.12.009. Epub 2019 Jan 3.

Abstract

Gene expression microarray classification is a crucial research field as it has been employed in cancer prediction and diagnosis systems. Gene expression data are composed of dozens of samples characterized by thousands of genes. Hence, an accurate and effective classification of such samples is a challenge. Machine learning techniques have been broadly utilized to build substantial and precise classification models. This paper proposes a new classification technique for gene expression data, which is called Modified k-nearest neighbor (MKNN). MKNN is applied in two scenarios namely; smallest modified KNN (SMKNN) and largest modified KNN (LMKNN). Both implementations are undertaken to enhance the performance of KNN. The key idea is to employ robust neighbors from training data by using a new weighting strategy. Several experiments have been performed on six different gene expression datasets. Experiments have shown that MKNN in its both scenarios outperforms traditional as well as recent ones. MKNN has been compared against (i) KNN, (ii) weighted KNN, (iii) support vector machine (SVM), (iv) fuzzy support vector machine, (v) brain emotional learning (BEL) in terms of classification accuracy, precision, and recall. On the other hand, results show that MKNN introduces smaller testing time than both KNN and weighted KNN.

Keywords: Cancer classification; Data mining; Gene expression; K-Nearest Neighbor; Microarray data classification.

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Data Interpretation, Statistical*
  • Gene Expression Regulation, Neoplastic*
  • Humans
  • Machine Learning
  • Models, Statistical*
  • Neoplasms / classification*
  • Neoplasms / genetics*
  • Neoplasms / pathology
  • Support Vector Machine