SD-MSAEs: Promoter recognition in human genome based on deep feature extraction

J Biomed Inform. 2016 Jun:61:55-62. doi: 10.1016/j.jbi.2016.03.018. Epub 2016 Mar 24.

Abstract

The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity.

Keywords: Context features; Promoter recognition; Sparse autoencoder; Statistical divergence; Support vector machine.

MeSH terms

  • Computational Biology*
  • DNA
  • Genome, Human*
  • Humans
  • Promoter Regions, Genetic*
  • Sequence Analysis, DNA

Substances

  • DNA