An improved method for detection of words with unusual occurrence frequency in nucleotide sequences

J Theor Biol. 1993 Dec 21;165(4):659-72. doi: 10.1006/jtbi.1993.1212.

Abstract

A statistical analysis designed to deal with the problem of identifying rare or abundant "words" of arbitrary length in genomic fragments is presented. Our approach has the novelty of taking into account the statistical role of the presence of shorter words nested into longer ones and of introducing a Bayesian correction to minimize the effects of statistical fluctuations and of possible mistakes in genomic data. The method is successfully used in a thorough analysis of the abundance of short nucleotide sequences in the Escherichia coli genome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence*
  • Escherichia coli / genetics
  • Genome, Bacterial
  • Mathematics
  • Models, Genetic*
  • Repetitive Sequences, Nucleic Acid*