Pipelines for cross-species and genome-wide prediction of long noncoding RNA binding

Nat Protoc. 2019 Mar;14(3):795-818. doi: 10.1038/s41596-018-0115-5.

Abstract

Abundant long, noncoding RNAs (lncRNAs) in mammals can bind to DNA sequences and recruit histone- and DNA-modifying enzymes to binding sites to epigenetically regulate target genes. However, most lncRNAs' binding motifs and target sites are unknown. The large numbers of lncRNAs and target sites in the whole genome make it infeasible to examine lncRNA binding to DNA purely experimentally. Here, we report a protocol for lncRNA/DNA-binding analysis that is built upon a database containing the GENCODE-annotated human and mouse lncRNAs, the orthologs of these lncRNAs in 17 mammals, and the genome sequences of the 17 mammals. Cross-species and genome-wide lncRNA/DNA-binding analysis begins with and is driven by database search. The predicted DNA-binding motifs and binding sites answer the general question of which lncRNAs may epigenetically regulate which genes, and can be used to identify potential sites for genome and epigenome editing. To use the protocol, preliminary knowledge of the base-pairing rules that guide the binding of noncoding RNAs to DNA to form triplexes, as well as the skills required to use the UCSC Genome Browser, are needed. A genome-wide prediction takes from 2 to 10 d, and the results are sent to users automatically by e-mail. The platform is updated continuously, making it possible to study more lncRNAs and larger genomic regions in less computational time.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Binding Sites
  • Computational Biology / methods*
  • Genome*
  • Humans
  • Mammals / genetics
  • Mice
  • Oligonucleotides / metabolism
  • Phylogeny
  • RNA, Long Noncoding / metabolism*
  • Species Specificity

Substances

  • Oligonucleotides
  • RNA, Long Noncoding