RepeatProfiler: A pipeline for visualization and comparative analysis of repetitive DNA profiles

Mol Ecol Resour. 2021 Apr;21(3):969-981. doi: 10.1111/1755-0998.13305. Epub 2021 Jan 4.

Abstract

Study of repetitive DNA elements in model organisms highlights the role of repetitive elements (REs) in many processes that drive genome evolution and phenotypic change. Because REs are much more dynamic than single-copy DNA, repetitive sequences can reveal signals of evolutionary history over short time scales that may not be evident in sequences from slower-evolving genomic regions. Many tools for studying REs are directed toward organisms with existing genomic resources, including genome assemblies and repeat libraries. However, signals in repeat variation may prove especially valuable in disentangling evolutionary histories in diverse non-model groups, for which genomic resources are limited. Here, we introduce RepeatProfiler, a tool for generating, visualizing, and comparing repetitive element DNA profiles from low-coverage, short-read sequence data. RepeatProfiler automates the generation and visualization of RE coverage depth profiles (RE profiles) and allows for statistical comparison of profile shape across samples. In addition, RepeatProfiler facilitates comparison of profiles by extracting signal from sequence variants across profiles which can then be analysed as molecular morphological characters using phylogenetic analysis. We validate RepeatProfiler with data sets from ground beetles (Bembidion), flies (Drosophila), and tomatoes (Solanum). We highlight the potential of RE profiles as a high-resolution data source for studies in species delimitation, comparative genomics, and repeat biology.

Keywords: CNV profiles; comparative analysis; genome evolution; repetitive elements; short-read sequence data.

MeSH terms

  • Animals
  • Coleoptera / genetics
  • DNA*
  • Data Visualization*
  • Drosophila / genetics
  • Evolution, Molecular
  • Genome
  • Genomics
  • Phylogeny
  • Repetitive Sequences, Nucleic Acid*
  • Software*
  • Solanum lycopersicum / genetics

Substances

  • DNA