Predicting Pseudouridine Sites with Porpoise

Methods Mol Biol. 2023:2624:139-151. doi: 10.1007/978-1-0716-2962-8_10.

Abstract

Pseudouridine is a ubiquitous RNA modification and plays a crucial role in many biological processes. However, it remains a challenging task to identify pseudouridine sites using expensive and time-consuming experimental research. To this end, we present Porpoise, a computational approach to identify pseudouridine sites from RNA sequence data. Porpoise builds on a stacking ensemble learning framework with several informative features and achieves competitive performance compared with state-of-the-art approaches. This protocol elaborates on step-by-step use and execution of the local stand-alone version and the webserver of Porpoise. In addition, we also provide a general machine learning framework that can help identify the optimal stacking ensemble learning model using different combinations of feature-based features. This general machine learning framework can facilitate users to build their pseudouridine predictors using their in-house datasets.

Keywords: Machine learning; RNA pseudouridine site; Sequence analysis; Stacking ensemble learning.

MeSH terms

  • Base Sequence
  • Machine Learning
  • Pseudouridine*
  • RNA* / genetics

Substances

  • Pseudouridine
  • RNA