Helitron distribution in Brassicaceae and whole Genome Helitron density as a character for distinguishing plant species

BMC Bioinformatics. 2019 Jun 24;20(1):354. doi: 10.1186/s12859-019-2945-8.

Abstract

Background: Helitron is a rolling-circle DNA transposon; it plays an important role in plant evolution. However, Helitron distribution and contribution to evolution at the family level have not been previously investigated.

Results: We developed the software easy-to-annotate Helitron (EAHelitron), a Unix-like command line, and used it to identify Helitrons in a wide range of 53 plant genomes (including 13 Brassicaceae species). We determined Helitron density (abundance/Mb) and visualized and examined Helitron distribution patterns. We identified more than 104,653 Helitrons, including many new Helitrons not predicted by other software. Whole genome Helitron density is independent from genome size and shows stability at the species level. Using linear discriminant analysis, de novo genomes (next-generation sequencing) were successfully classified into Arabidopsis thaliana groups. For most Brassicaceae species, Helitron density negatively correlated with gene density, and Helitron distribution patterns were similar to those of A. thaliana. They preferentially inserted into sequence around the centromere and intergenic region. We also associated 13 Helitron polymorphism loci with flowering-time phenotypes in 18 A. thaliana ecotypes.

Conclusion: EAHelitron is a fast and efficient tool to identify new Helitrons. Whole genome Helitron density can be an informative character for plant classification. Helitron insertion polymorphism could be used in association analysis.

Keywords: Bioinformatics; Genomic evolution; Multivariate analysis; Plant classification; Transposable element.

MeSH terms

  • Arabidopsis / classification
  • Arabidopsis / genetics
  • Brassicaceae / classification
  • Brassicaceae / genetics*
  • DNA Transposable Elements / genetics
  • Discriminant Analysis
  • Evolution, Molecular
  • Genome, Plant*
  • High-Throughput Nucleotide Sequencing
  • Phylogeny
  • Sequence Analysis, DNA
  • Software*

Substances

  • DNA Transposable Elements