i2OM: Toward a better prediction of 2'-O-methylation in human RNA

Int J Biol Macromol. 2023 Jun 1:239:124247. doi: 10.1016/j.ijbiomac.2023.124247. Epub 2023 Mar 30.

Abstract

2'-O-methylation (2OM) is an omnipresent post-transcriptional modification in RNAs. It is important for the regulation of RNA stability, mRNA splicing and translation, as well as innate immunity. With the increase in publicly available 2OM data, several computational tools have been developed for the identification of 2OM sites in human RNA. Unfortunately, these tools suffer from the low discriminative power of redundant features, unreasonable dataset construction or overfitting. To address those issues, based on four types of 2OM (2OM-adenine (A), cytosine (C), guanine (G), and uracil (U)) data, we developed a two-step feature selection model to identify 2OM. For each type, the one-way analysis of variance (ANOVA) combined with mutual information (MI) was proposed to rank sequence features for obtaining the optimal feature subset. Subsequently, four predictors based on eXtreme Gradient Boosting (XGBoost) or support vector machine (SVM) were presented to identify the four types of 2OM sites. Finally, the proposed model could produce an overall accuracy of 84.3 % on the independent set. To provide a convenience for users, an online tool called i2OM was constructed and can be freely access at i2om.lin-group.cn. The predictor may provide a reference for the study of the 2OM.

Keywords: 2′-O-methylation; Feature selection; Machine learning; Web server.

MeSH terms

  • Computational Biology*
  • Cytosine
  • Humans
  • Methylation
  • RNA* / genetics
  • Support Vector Machine

Substances

  • RNA
  • Cytosine