In Silico Promoter Recognition from deepCAGE Data

Methods Mol Biol. 2017:1468:171-99. doi: 10.1007/978-1-4939-4035-6_13.

Abstract

The accurate identification of transcription start regions corresponding to the promoters of known genes, novel coding, and noncoding transcripts, as well as enhancer elements, is a crucial step towards a complete understanding of state-specific gene regulatory networks. Recent high-throughput techniques, such as deepCAGE or single-molecule CAGE, have made it possible to identify the genome-wide location, relative expression, and differential usage of transcription start regions across hundreds of different tissues and cell lines. Here, we describe in detail the necessary computational analysis of CAGE data, with focus on two recent in silico methodologies for CAGE peak/profile definition and promoter recognition, namely the Decomposition-based Peak Identification (DPI) and the PROmiRNA software. We apply both methodologies to the challenging task of identifying primary microRNAs transcript (pri-miRNA) start sites and compare the results.

Keywords: DPI; PROmiRNA; Promoter; TSS; microRNAs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Computer Simulation
  • Databases, Genetic
  • High-Throughput Screening Assays
  • Humans
  • MicroRNAs / genetics*
  • Promoter Regions, Genetic*
  • Sequence Analysis, RNA / methods*
  • Software
  • Transcription Initiation Site
  • Transcription, Genetic

Substances

  • MicroRNAs