Sequence-based bacterial small RNAs prediction using ensemble learning strategies

BMC Bioinformatics. 2018 Dec 21;19(Suppl 20):503. doi: 10.1186/s12859-018-2535-1.

Abstract

Background: Bacterial small non-coding RNAs (sRNAs) have emerged as important elements in diverse physiological processes, including growth, development, cell proliferation, differentiation, metabolic reactions and carbon metabolism, and attract great attention. Accurate prediction of sRNAs is important and challenging, and helps to explore functions and mechanism of sRNAs.

Results: In this paper, we utilize a variety of sRNA sequence-derived features to develop ensemble learning methods for the sRNA prediction. First, we compile a balanced dataset and four imbalanced datasets. Then, we investigate various sRNA sequence-derived features, such as spectrum profile, mismatch profile, reverse compliment k-mer and pseudo nucleotide composition. Finally, we consider two ensemble learning strategies to integrate all features for building ensemble learning models for the sRNA prediction. One is the weighted average ensemble method (WAEM), which uses the linear weighted sum of outputs from the individual feature-based predictors to predict sRNAs. The other is the neural network ensemble method (NNEM), which trains a deep neural network by combining diverse features. In the computational experiments, we evaluate our methods on these five datasets by using 5-fold cross validation. WAEM and NNEM can produce better results than existing state-of-the-art sRNA prediction methods.

Conclusions: WAEM and NNEM have great potential for the sRNA prediction, and are helpful for understanding the biological mechanism of bacteria.

Keywords: Ensemble learning; Neural network; Sequence-derived feature; Small RNA prediction.

MeSH terms

  • Algorithms*
  • Area Under Curve
  • Bacteria / genetics*
  • Base Sequence
  • Benchmarking
  • Computational Biology / methods*
  • Databases, Nucleic Acid
  • Neural Networks, Computer
  • RNA, Bacterial / genetics*
  • RNA, Untranslated / genetics*

Substances

  • RNA, Bacterial
  • RNA, Untranslated