Identification of new eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis of transcriptional control regions

Nucleic Acids Res. 1994 Apr 11;22(7):1247-56. doi: 10.1093/nar/22.7.1247.

Abstract

A linear method for the search of eukaryotic nuclear tRNA genes in DNA databases is described. Based on a modified version of the general weight matrix procedure, our algorithm relies on the recognition of two intragenic control regions known as A and B boxes, a transcription termination signal, and on the evaluation of the spacing between these elements. The scanning of the eukaryotic nuclear DNA database using this search algorithm correctly identified 933 of the 940 known tRNA genes (0.74% of false negatives). Thirty new potential tRNA genes were identified, and the transcriptional activity of two of them was directly verified by in vitro transcription. The total false positive rate of the algorithm was 0.014%. Structurally unusual tRNA genes, like those coding for selenocysteine tRNAs, could also be recognized using a set of rules concerning their specific properties, and one human gene coding for such tRNA was identified. Some of the newly identified tRNA genes were found in rather uncommon genomic positions: 2 in centromeric regions and 3 within introns. Furthermore, the presence of extragenically located B boxes in tRNA genes from various organisms could be detected through a specific subroutine of the standard search program.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Base Sequence
  • DNA
  • Databases, Factual*
  • Humans
  • Information Storage and Retrieval*
  • Molecular Sequence Data
  • Nucleic Acid Conformation
  • Polymerase Chain Reaction
  • RNA, Transfer / chemistry
  • RNA, Transfer / genetics*
  • Regulatory Sequences, Nucleic Acid
  • Transcription, Genetic*

Substances

  • DNA
  • RNA, Transfer