M6A-GSMS: Computational identification of N6-methyladenosine sites with GBDT and stacking learning in multiple species

J Biomol Struct Dyn. 2022;40(22):12380-12391. doi: 10.1080/07391102.2021.1970628. Epub 2021 Aug 30.

Abstract

N6-methyladenosine (m6A) is one of the most abundant forms of RNA methylation modifications currently known. It involves a wide range of biological processes, including degradation, stability, alternative splicing, etc. Therefore, the development of convenient and efficient m6A prediction technologies are urgent. In this work, a novel predictor based on GBDT and stacking learning is developed to identify m6A sites, which is called M6A-GSMS. To achieve accurate prediction, we explore RNA sequence information from four aspects: correlation, structure, physicochemical properties and pseudo ribonucleic acid composition. After using the GBDT algorithm for feature selection, a stacking model is constructed by combining seven basic classifiers. Compared with other state-of-the-art methods, the results show that M6A-GSMS can obtain excellent performance for identifying the m6A sites. The prediction accuracy of A.thaliana, D.melanogaster, M.musculus, S.cerevisiae and Human reaches 88.4%, 60.8%, 80.5%, 92.4% and 61.8%, respectively. This method provides an effective prediction for the investigation of m6A sites. In addition, all the datasets and codes are currently available at https://github.com/Wang-Jinyue/M6A-GSMS.Communicated by Ramaswamy H. Sarma.

Keywords: GBDT; N6-methyladenosine site; RNA methylation; Stacking learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenosine / chemistry
  • Arabidopsis* / genetics
  • Humans
  • Methylation
  • RNA* / chemistry

Substances

  • N-methyladenosine
  • O-(glucuronic acid 2-sulfate)-(1--4)-O-(2,5)-anhydromannitol 6-sulfate
  • RNA
  • Adenosine