CRM Discovery Beyond Model Insects

Methods Mol Biol. 2019:1858:117-139. doi: 10.1007/978-1-4939-8775-7_10.

Abstract

Although the number of sequenced insect genomes numbers in the hundreds, little is known about gene regulatory sequences in any species other than the well-studied Drosophila melanogaster. We provide here a detailed protocol for using SCRMshaw, a computational method for predicting cis-regulatory modules (CRMs, also "enhancers") in sequenced insect genomes. SCRMshaw is effective for CRM discovery throughout the range of holometabolous insects and potentially in even more diverged species, with true-positive prediction rates of 75% or better. Minimal requirements for using SCRMshaw are a genome sequence and training data in the form of known Drosophila CRMs; a comprehensive set of the latter can be obtained from the SCRMshaw download site. For basic applications, a user with only modest computational know-how can run SCRMshaw on a desktop computer. SCRMshaw can be run with a single, narrow set of training data to predict CRMs regulating a specific pattern of gene expression, or with multiple sets of training data covering a broad range of CRM activities to provide an initial rough regulatory annotation of a complete, newly-sequenced genome.

Keywords: Enhancer prediction; Genome annotation; Non-model insects; Regulatory genomics; Transcriptional gene regulation.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Computational Biology / methods*
  • DNA / analysis
  • DNA / genetics*
  • Genome, Insect*
  • High-Throughput Nucleotide Sequencing / methods
  • Insect Proteins / genetics*
  • Insecta / genetics*
  • Regulatory Sequences, Nucleic Acid*
  • Sequence Analysis, DNA / methods

Substances

  • Insect Proteins
  • DNA