Analysis of insertional sites of the SIRE1 retroelement family from Glycine max using GenBank BAC-end sequences

In Silico Biol. 2008;8(5-6):531-43.

Abstract

SIRE1 is a 2000-copy member of the Ty1/copia retroelement family found in the soybean genome and is closely related to sireviruses found in the genomes of other legumes. Although these elements closely resemble typical plant members of the Ty1/copia family, they are unusual in that they possess an envelope-like coding region immediately downstream of the reverse transcriptase gene. Despite its copy number, very few members of the SIRE1 family are currently present in publicly available genomic assemblies or draft contigs. However, fragments of family members are well-represented as BAC-ends in the GenBank Genome Survey Sequence database. This database was queried using the 5' and 3' ends of SIRE1 in order to catalog sequences into which SIRE1 members have integrated. Seven hundred and eighty-one unique SIRE1 insertions were identified and the majority of insertion sites constituted other repetitive elements, including Class I and Class II transposable elements and satellite DNAs. Ninety-four insertions were in single- or low-copy number sequences and three of these were homologous to characterized protein-coding genes. Examination of the ten bases flanking either side of SIRE1 revealed no clear consensus sequence, but the the distributions of A, C, G, and T at most of the positions were biased with strong statistical significance.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • DNA, Intergenic / genetics*
  • Databases, Nucleic Acid*
  • Glycine / genetics
  • Glycine / metabolism
  • Multigene Family / genetics
  • Retroelements / genetics*
  • Sequence Analysis, DNA / methods*

Substances

  • DNA, Intergenic
  • Retroelements
  • Glycine