SAPPHIRE.CNN: Implementation of dRNA-seq-driven, species-specific promoter prediction using convolutional neural networks

Comput Struct Biotechnol J. 2022 Sep 9:20:4969-4974. doi: 10.1016/j.csbj.2022.09.006. eCollection 2022.

Abstract

Data availability is a consistent bottleneck for the development of bacterial species-specific promoter prediction software. In this work we leverage genome-wide promoter datasets generated with dRNA-seq in the Gram-negative bacteria Pseudomonas aeruginosa and Salmonella enterica for promoter prediction. Convolutional neural networks are presented as an optimal architecture for model training and are further modified and tailored for promoter prediction. The resulting predictors reach high binary accuracies (95% and 94.9%) on test sets and outperform each other when predicting promoters in their associated species. SAPPHIRE.CNN is available online and can also be downloaded to run locally. Our results indicate a dependency of binary promoter classification on an organism's GC content and a decreased performance of our classifiers on genera they were not trained for, further supporting the need for dedicated, species-specific promoter classification tools.