The theoretical basis of universal identification systems for bacteria and viruses

J Biol Phys Chem. 2005 Dec 1;5(4):121-128. doi: 10.4024/40501.jbpc.05.04.

Abstract

It is shown that the presence/absence pattern of 1000 random oligomers of length 12-13 in a bacterial genome is sufficiently characteristic to readily and unambiguously distinguish any known bacterial genome from any other. Even genomes of extremely closely-related organisms, such as strains of the same species, can be thus distinguished. One evident way to implement this approach in a practical assay is with hybridization arrays. It is envisioned that a single universal array can be readily designed that would allow identification of any bacterium that appears in a database of known patterns. We performed in silico experiments to test this idea. Calculations utilizing 105 publicly-available completely-sequenced microbial genomes allowed us to determine appropriate values of the test oligonucleotide length, n, and the number of probe sequences. Randomly chosen n-mers with a constant G + C content were used to form an in silico array and verify (a) how many n-mers from each genome would hybridize on this chip, and (b) how different the fingerprints of different genomes would be. With the appropriate choice of random oligomer length, the same approach can also be used to identify viral or eukaryotic genomes.