A Gene selection approach based on the fisher linear discriminant and the neighborhood rough set

Bioengineered. 2018 Jan 1;9(1):144-151. doi: 10.1080/21655979.2017.1403678. Epub 2017 Dec 19.

Abstract

In recent years, tumor classification based on gene expression profiles has drawn great attention, and related research results have been widely applied to the clinical diagnosis of major gene diseases. These studies are of tremendous importance for accurate cancer diagnosis and subtype recognition. However, the microarray data of gene expression profiles have small samples, high dimensionality, large noise and data redundancy. To further improve the classification performance of microarray data, a gene selection approach based on the Fisher linear discriminant (FLD) and the neighborhood rough set (NRS) is proposed. First, the FLD method is employed to reduce the preliminarily genetic data to obtain features with a strong classification ability, which can form a candidate gene subset. Then, neighborhood precision and neighborhood roughness are defined in a neighborhood decision system, and the calculation approaches for neighborhood dependency and the significance of an attribute are given. A reduction model of neighborhood decision systems is presented. Thus, a gene selection algorithm based on FLD and NRS is proposed. Finally, four public gene datasets are used in the simulation experiments. Experimental results under the SVM classifier demonstrate that the proposed algorithm is effective, and it can select a smaller and more well-classified gene subset, as well as obtain better classification performance.

Keywords: Fisher linear discriminant; Gene selection; neighborhood rough set; reduction.

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Colonic Neoplasms / diagnosis
  • Colonic Neoplasms / genetics*
  • Colonic Neoplasms / pathology
  • Computational Biology
  • Databases, Genetic
  • Datasets as Topic
  • Discriminant Analysis
  • Female
  • Gene Expression Profiling
  • Gene Expression Regulation, Neoplastic*
  • Genes, Neoplasm*
  • Humans
  • Leukemia / diagnosis
  • Leukemia / genetics*
  • Leukemia / pathology
  • Lung Neoplasms / diagnosis
  • Lung Neoplasms / genetics*
  • Lung Neoplasms / pathology
  • Male
  • Microarray Analysis
  • Multifactor Dimensionality Reduction
  • Prostatic Neoplasms / diagnosis
  • Prostatic Neoplasms / genetics*
  • Prostatic Neoplasms / pathology

Grants and funding

China Postdoctoral Science Foundation, 2016M602247, Founf for Youth Key Teachers of Henan Normal University of China, qd15132, Key Research Project of High School of Henan Province of China, 14A520069, Key Scientific and Technological Project of Xinxiang City of China, CXGG17002, National Natural Science Foundation of China (NSFC), 61772176, 61402153, 61370169, 61502319,U1604154 Key Project of Science and Technology Development of Henan Provience of China, 162102210261