Automatic classification within families of transposable elements: application to the mariner Family

Gene. 2009 Dec 15;448(2):227-32. doi: 10.1016/j.gene.2009.08.009. Epub 2009 Aug 27.

Abstract

The higher levels of the classification of transposable elements (TEs) from Classes to Superfamilies or Families, is regularly updated, but the lower levels (below the Family) have received little investigation. In particular, this applies to the Families that include a large number of copies. In this article we propose an automatic classification of DNA sequences. This procedure is based on an aggregation process using a pairwise matrix of distances, allowing us to define several groups characterized by a sphere with a central sequence and a radius. This method was tested on the mariner Family, because this is probably one of the most extensively studied Families. Several Subfamilies had already been defined from phylogenetic analyses based on multiple alignments of complete or partial amino-acid sequences of the transposase. The classification obtained here from DNA sequences of 935 items matches the phylogenies of the transposase. The rate of error from a posteriori re-assignment is relatively low.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Caenorhabditis / genetics
  • Classification / methods*
  • Cluster Analysis
  • DNA Transposable Elements / genetics*
  • DNA-Binding Proteins / classification*
  • DNA-Binding Proteins / genetics*
  • Drosophila / genetics
  • Multigene Family / genetics
  • Mutagenesis, Insertional / genetics
  • Phylogeny
  • Transposases / classification*
  • Transposases / genetics*

Substances

  • DNA Transposable Elements
  • DNA-Binding Proteins
  • mariner transposases
  • Transposases