Computer analysis of nucleic acid regulatory sequences

Proc Natl Acad Sci U S A. 1977 Oct;74(10):4401-5. doi: 10.1073/pnas.74.10.4401.

Abstract

We describe a computer program designed to facilitate the analysis of nucleic acid sequences. The program can search several nucleic acid sequences for oligonucleotides common to all of them. It can examine a DNA or RNA sequence for two kinds of homologous regions--repetitions and dyad symmetries. The homologies need not be perfect: mismatches and "looping out" of nucleotides are allowed. The program also finds (A+T)- and (G+C)-rich regions, locates restriction enzyme recognition sites, determines the distribution of di- and trinucleotides, and performs various other functions. We include two representative applications of the program. All published prokaryotic transcription termination sequences (June 1977) were found to share the following features: (i) a string of at least five T residues, (ii) the sequence CGGGC or a close analog immediately preceding the T cluster, (iii) a region of strong dyad symmetry preceding the Ts and including the CGGGC sequence. A sequence of 221 nucleotides consisting of the Escherichia coli trp promoter, operator, and leader was found to contain two strong dyad symmetries. These homologies both occur at known regulatory sites; no comparable homologies occur in regions without regulatory significance.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Base Sequence*
  • Computers
  • Methods
  • Oligonucleotides / analysis
  • Operon
  • Transcription, Genetic

Substances

  • Oligonucleotides