Promoter prediction in nannochloropsis based on densely connected convolutional neural networks

Methods. 2022 Aug:204:38-46. doi: 10.1016/j.ymeth.2022.03.017. Epub 2022 Mar 31.

Abstract

Promoter is a key DNA element located near the transcription start site, which regulates gene transcription by binding RNA polymerase. Thus, the identification of promoters is an important research field in synthetic biology. Nannochloropsis is an important unicellular industrial oleaginous microalgae, and at present, some studies have identified some promoters with specific functions by biological methods in Nannochloropsis, whereas few studies used computational methods. Here, we propose a method called DNPPro (DenseNet-Predict-Promoter) based on densely connected convolutional neural networks to predict the promoter of Nannochloropsis. First, we collected promoter sequences from six Nannochloropsis strains and removed 80% similarity using CD-HIT for each strain to yield a reliable set of positive datasets. Then, in order to construct a robust classifier, within-group scrambling method was used to generate negative dataset which overcomes the limitation of randomly selecting a non-promoter region from the same genome as a negative sample. Finally, we constructed a densely connected convolutional neural network, with the sequence one-hot encoding as the input. Compared with commonly used sequence processing methods, DNPPro can extract long sequence features to a greater extent. The cross-strain experiment on independent dataset verifies the generalization of our method. At the same time, T-SNE visualization analysis shows that our method can effectively distinguish promoters from non-promoters.

Keywords: Deep learning; Densely connected convolutional neural networks; Nannochloropsis; Promoter; Within-group scrambling.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Neural Networks, Computer*
  • Promoter Regions, Genetic
  • Synthetic Biology*