An embedded gene selection method using knockoffs optimizing neural network

BMC Bioinformatics. 2020 Sep 22;21(1):414. doi: 10.1186/s12859-020-03717-w.

Abstract

Background: Gene selection refers to find a small subset of discriminant genes from the gene expression profiles. How to select genes that affect specific phenotypic traits effectively is an important research work in the field of biology. The neural network has better fitting ability when dealing with nonlinear data, and it can capture features automatically and flexibly. In this work, we propose an embedded gene selection method using neural network. The important genes can be obtained by calculating the weight coefficient after the training is completed. In order to solve the problem of black box of neural network and further make the training results interpretable in neural network, we use the idea of knockoffs to construct the knockoff feature genes of the original feature genes. This method not only make each feature gene to compete with each other, but also make each feature gene compete with its knockoff feature gene. This approach can help to select the key genes that affect the decision-making of neural networks.

Results: We use maize carotenoids, tocopherol methyltransferase, raffinose family oligosaccharides and human breast cancer dataset to do verification and analysis.

Conclusions: The experiment results demonstrate that the knockoffs optimizing neural network method has better detection effect than the other existing algorithms, and specially for processing the nonlinear gene expression and phenotype data.

Keywords: Gene mining; Knockoffs; Maize; Neural network; Nonlinear data.

MeSH terms

  • Breast Neoplasms / genetics
  • Computational Biology / methods
  • Data Mining / methods*
  • Female
  • Gene Expression Regulation
  • Humans
  • Neural Networks, Computer*
  • Transcriptome*
  • Zea mays / enzymology
  • Zea mays / genetics
  • Zea mays / metabolism