A comprehensive annotation dataset of intact LTR retrotransposons of 300 plant genomes

Sci Data. 2021 Jul 15;8(1):174. doi: 10.1038/s41597-021-00968-x.

Abstract

LTR retrotransposons (LTR-RTs) are ubiquitous and represent the dominant repeat element in plant genomes, playing important roles in functional variation, genome plasticity and evolution. With the advent of new sequencing technologies, a growing number of whole-genome sequences have been made publicly available, making it possible to carry out systematic analyses of LTR-RTs. However, a comprehensive and unified annotation of LTR-RTs in plant groups is still lacking. Here, we constructed a plant intact LTR-RTs dataset, which is designed to classify and annotate intact LTR-RTs with a standardized procedure. The dataset currently comprises a total of 2,593,685 intact LTR-RTs from genomes of 300 plant species representing 93 families of 46 orders. The dataset is accompanied by sequence, diverse structural and functional annotation, age determination and classification information associated with the LTR-RTs. This dataset will contribute valuable resources for investigating the evolutionary dynamics and functional implications of LTR-RTs in plant genomes.

Publication types

  • Dataset
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Evolution, Molecular
  • Genome, Plant*
  • Molecular Sequence Annotation
  • Plants / genetics*
  • Retroelements*
  • Terminal Repeat Sequences*

Substances

  • Retroelements