Detection of deleted genomic DNA using a semiautomated computational analysis of GeneChip data

Genome Res. 2000 Dec;10(12):2044-54. doi: 10.1101/gr.gr-1529r.

Abstract

Genomic diversity within and between populations is caused by single nucleotide mutations, changes in repetitive DNA systems, recombination mechanisms, and insertion and deletion events. The contribution of these sources to diversity, whether purely genetic or of phenotypic consequence, can only be investigated if we have the means to quantitate and characterize diversity in many samples. With the advent of complete sequence characterization of representative genomes of different species, the possibility of developing protocols to screen for genetic polymorphism across entire genomes is actively being pursued. The large numbers of measurements such approaches yield demand that we pay careful attention to the numerical analysis of data. In this paper we present a novel application of an Affymetrix GeneChip to perform genome-wide screens for deletion polymorphism. A high-density oligonucleotide array formatted for mRNA expression and targeted at a fully sequenced 4.4-million-base pair Mycobacterium tuberculosis standard strain genome was adapted to compare genomic DNA. Hybridization intensities to 111,000 probe pairs (perfect complement and mismatch complement) were measured for genomic DNA from a clinical strain and from a vaccine organism. Because individual probe-pair hybridization intensities exhibit limited sensitivity/specificity characteristics to detect deletions, data-analytical methodology to exploit measurements from multiple probes in tandem locations across the genome was developed. The TSTEP (Tandem Set Terminal Extreme Probability) algorithm designed specifically to analyze the tandem hybridization measurements data was applied and shown to discover genomic deletions with high sensitivity. The TSTEP algorithm provides a foundation for similar efforts to characterize deletions in many hybridization measures in similar-sized and larger genomes. Issues relating to the design of genome content screening experiments and the implications of these methods for studying population genomics and the evolution of genomes are discussed.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • DNA, Bacterial / analysis*
  • DNA, Bacterial / genetics*
  • Genes, Bacterial / genetics
  • Genome, Bacterial
  • Mycobacterium bovis / genetics
  • Mycobacterium tuberculosis / genetics
  • Oligonucleotide Array Sequence Analysis / methods*
  • Sequence Deletion / genetics*

Substances

  • DNA, Bacterial