ConsRM: collection and large-scale prediction of the evolutionarily conserved RNA methylation sites, with implications for the functional epitranscriptome

Brief Bioinform. 2021 Nov 5;22(6):bbab088. doi: 10.1093/bib/bbab088.

Abstract

Motivation N6-methyladenosine (m6A) is the most prevalent RNA modification on mRNAs and lncRNAs. Evidence increasingly demonstrates its crucial importance in essential molecular mechanisms and various diseases. With recent advances in sequencing techniques, tens of thousands of m6A sites are identified in a typical high-throughput experiment, posing a key challenge to distinguish the functional m6A sites from the remaining 'passenger' (or 'silent') sites. Results: We performed a comparative conservation analysis of the human and mouse m6A epitranscriptomes at single site resolution. A novel scoring framework, ConsRM, was devised to quantitatively measure the degree of conservation of individual m6A sites. ConsRM integrates multiple information sources and a positive-unlabeled learning framework, which integrated genomic and sequence features to trace subtle hints of epitranscriptome layer conservation. With a series validation experiments in mouse, fly and zebrafish, we showed that ConsRM outperformed well-adopted conservation scores (phastCons and phyloP) in distinguishing the conserved and unconserved m6A sites. Additionally, the m6A sites with a higher ConsRM score are more likely to be functionally important. An online database was developed containing the conservation metrics of 177 998 distinct human m6A sites to support conservation analysis and functional prioritization of individual m6A sites. And it is freely accessible at: https://www.xjtlu.edu.cn/biologicalsciences/con.

Keywords: N6-methyladenosine (m6A); conservation analysis; genome analysis; scoring framework.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Humans
  • Mice
  • RNA Processing, Post-Transcriptional*
  • RNA, Messenger / biosynthesis
  • RNA, Messenger / genetics*
  • Sequence Analysis, RNA*
  • Software*
  • Transcriptome*
  • Zebrafish

Substances

  • RNA, Messenger