Simple and fast classification of non-LTR retrotransposons based on phylogeny of their RT domain protein sequences

Gene. 2009 Dec 15;448(2):207-13. doi: 10.1016/j.gene.2009.07.019. Epub 2009 Aug 3.

Abstract

Rapidly growing number of sequenced genomes requires fast and accurate computational tools for analysis of different transposable elements (TEs). In this paper we focus on a rapid and reliable procedure for classification of autonomous non-LTR retrotransposons based on alignment and clustering of their reverse transcriptase (RT) domains. Typically, the RT domain protein sequences encoded by different non-LTR retrotransposons are similar to each other in terms of significant BLASTP E-values. Therefore, they can be easily detected by the routine BLASTP searches of genomic DNA sequences coding for proteins similar to the RT domains of known non-LTR retrotransposons. However, detailed classification of non-LTR retrotransposons, i.e. their assignment to specific clades, is a slow and complex procedure that is not formalized or integrated as a standard set of computational methods and data. Here we describe a tool (RTclass1) designed for the fast and accurate automated assignment of novel non-LTR retrotransposons to known or novel clades using phylogenetic analysis of the RT domain protein sequences. RTclass1 classifies a particular non-LTR retrotransposon based on its RT domain in less than 10 min on a standard desktop computer and achieves 99.5% accuracy. RT1class1 works either as a stand-alone program installed locally or as a web-server that can be accessed distantly by uploading sequence data through the internet (http://www.girinst.org/RTphylogeny/RTclass1).

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Classification / methods*
  • Models, Genetic
  • Phylogeny*
  • Protein Structure, Tertiary / genetics
  • RNA-Directed DNA Polymerase / chemistry
  • RNA-Directed DNA Polymerase / genetics*
  • Reproducibility of Results
  • Retroelements* / genetics
  • Sequence Analysis, DNA / methods
  • Terminal Repeat Sequences / genetics

Substances

  • Retroelements
  • RNA-Directed DNA Polymerase