An Overview of Best Practices for Transposable Element Identification, Classification, and Annotation in Eukaryotic Genomes

Methods Mol Biol. 2023:2607:1-23. doi: 10.1007/978-1-0716-2883-6_1.

Abstract

Transposable elements (TEs) exert an increasingly diverse spectrum of influences on eukaryotic genome structure, function, and evolution. A deluge of genomic, transcriptomic, and proteomic data provides the foundation for turning essentially any non-model eukaryotic species into an emerging model to study any and all aspects of organismal biology, ultimately shaping future directions for biomedical, environmental, and biodiversity research. However, identification and annotation of the mobile genome component still lags behind the standards accepted for host gene annotation. To achieve the objective of providing every genome project with a comprehensive description of its mobilome component in addition to the standard genic and transcriptomic datasets, each step of TE identification, classification, and annotation should be focused on improving TE boundary designation, reducing identification error rates, and providing accurate information on the type and integrity of TE insertions. Here, we offer practical advice for generating TE models in de novo assemblies for non-model organisms, provide step-by-step instructions to guide inexperienced TE annotators through some of the commonly utilized TE analysis pipelines, and entertain suggestions for tool improvement which could be implemented by interested developers.

Keywords: Consensus sequences; DNA transposons; De novo repeat identification; Manual curation; Repeat library; Repetitive DNA; Retrotransposons.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • DNA Transposable Elements* / genetics
  • Eukaryota* / genetics
  • Eukaryotic Cells
  • Molecular Sequence Annotation
  • Proteomics

Substances

  • DNA Transposable Elements