Detecting single-feature polymorphisms using oligonucleotide arrays and robustified projection pursuit

Bioinformatics. 2005 Oct 15;21(20):3852-8. doi: 10.1093/bioinformatics/bti640. Epub 2005 Aug 23.

Abstract

Motivation: Genomic DNA was hybridized to oligonucleotide microarrays to identify single-feature polymorphisms (SFP) for Arabidopsis, which has a genome size of approximately 130 Mb. However, that method does not work well for organisms such as barley, with a much larger 5200 Mb genome. In the present study, we demonstrate SFP detection using a small number of replicate datasets and complex RNA as a surrogate for barley DNA. To identify single probes defining SFPs in the data, we developed a method using robustified projection pursuit (RPP). This method first evaluates, for each probe set, the overall differentiation of signal intensities between two genotypes and then measures the contribution of the individual probes within the probe set to the overall differentiation.

Results: RNA from whole seedlings with and without dehydration stress provided 'present' calls for approximately 75% of probe sets. Using triplicated data, among the 5% of 'present' probe sets identified as most likely to contain at least one SFP probe, at least 80% are correctly predicted. This was determined by direct sequencing of PCR amplicons derived from barley genomic DNA. Using a 5 percentile cutoff, we defined 2007 SFP probes contained in 1684 probe sets by combining three parental genotype comparisons: Steptoe versus Morex, Morex versus Barke and Oregon Wolfe Barley Dominant versus Recessive.

Availability: The algorithm is available upon request from the corresponding author.

Contact: xinping.cui@ucr.edu

Supplementary information: http://faculty.ucr.edu/~xpcui.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Arabidopsis / genetics*
  • Chromosome Mapping / methods*
  • DNA Mutational Analysis / methods*
  • DNA, Plant / genetics*
  • Genome, Plant
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated / methods
  • Polymorphism, Single Nucleotide / genetics*

Substances

  • DNA, Plant