Recognition of 3'-end L1, Alu, processed pseudogenes, and mRNA stem-loops in the human genome using sequence-based and structure-based machine-learning models

Sci Rep. 2019 May 10;9(1):7211. doi: 10.1038/s41598-019-43403-3.

Abstract

The role of 3'-end stem-loops in retrotransposition was experimentally demonstrated for transposons of various species, where LINE-SINE retrotransposons share the same 3'-end sequences, containing a stem-loop. We have discovered that 62-68% of processed pseduogenes and mRNAs also have 3'-end stem-loops. We investigated the properties of 3'-end stem-loops of human L1s, Alus, processed pseudogenes and mRNAs that do not share the same sequences, but all have 3'-end stem-loops. We have built sequence-based and structure-based machine-learning models that are able to recognize 3'-end L1, Alu, processed pseudogene and mRNA stem-loops with high performance. The sequence-based models use only sequence information and capture compositional bias in 3'-ends. The structure-based models consider physical, chemical and geometrical properties of dinucleotides composing a stem and position-specific nucleotide content of a loop and a bulge. The most important parameters include shift, tilt, rise, and hydrophilicity. The obtained results clearly point to the existence of structural constrains for 3'-end stem-loops of L1 and Alu, which are probably important for transposition, and reveal the potential of mRNAs to be recognized by the L1 machinery. The proposed approach is applicable to a broader task of recognizing RNA (DNA) secondary structures. The constructed models are freely available at github ( https://github.com/AlexShein/transposons/ ).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 3' Untranslated Regions
  • Alu Elements / genetics*
  • Area Under Curve
  • Genome, Human*
  • Humans
  • Long Interspersed Nucleotide Elements / genetics*
  • Machine Learning*
  • Pseudogenes / genetics*
  • RNA, Messenger / metabolism*
  • ROC Curve

Substances

  • 3' Untranslated Regions
  • RNA, Messenger