PromoterLCNN: A Light CNN-Based Promoter Prediction and Classification Model

Genes (Basel). 2022 Jun 23;13(7):1126. doi: 10.3390/genes13071126.

Abstract

Promoter identification is a fundamental step in understanding bacterial gene regulation mechanisms. However, accurate and fast classification of bacterial promoters continues to be challenging. New methods based on deep convolutional networks have been applied to identify and classify bacterial promoters recognized by sigma (σ) factors and RNA polymerase subunits which increase affinity to specific DNA sequences to modulate transcription and respond to nutritional or environmental changes. This work presents a new multiclass promoter prediction model by using convolutional neural networks (CNNs), denoted as PromoterLCNN, which classifies Escherichia coli promoters into subclasses σ70, σ24, σ32, σ38, σ28, and σ54. We present a light, fast, and simple two-stage multiclass CNN architecture for promoter identification and classification. Training and testing were performed on a benchmark dataset, part of RegulonDB. Comparative performance of PromoterLCNN against other CNN-based classifiers using four parameters (Acc, Sn, Sp, MCC) resulted in similar or better performance than those that commonly use cascade architecture, reducing time by approximately 30-90% for training, prediction, and hyperparameter optimization without compromising classification quality.

Keywords: PromoterLCNN; bacterial promoters; bioinformatics; convolutional neural networks; deep learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA-Directed RNA Polymerases* / genetics
  • Escherichia coli / genetics
  • Escherichia coli / metabolism
  • Gene Expression Regulation, Bacterial
  • Promoter Regions, Genetic
  • Sigma Factor* / genetics
  • Sigma Factor* / metabolism

Substances

  • Sigma Factor
  • DNA-Directed RNA Polymerases

Grants and funding

This work has been supported by Project USM PI_M_2020_43 (D.H., N.J, M.A., R.E.D., C.B.-A.), Millennium Institute for Foundational Research on Data (IMFD) (C.B.-A.), ANID-Basal Project FB0008 (AC3E; M.A.) and ANID PIA/APOYO AFB180002 (CCTVal; M.A.).