Detection and restoration of hybridization problems in affymetrix GeneChip data by parametric scanning

Genome Inform. 2006;17(2):100-9.

Abstract

Gene expression microarray data often include problems caused by uneven hybridization and dust contamination. Such problems should be removed prior to analysis to prevent degradation of analytical accuracy and false positive results. This paper presents a parameter-scanning algorithm to detect such defects on the basis of the character of data distributions. The cell data is thoroughly scanned using a window algorithm, and windows with an index value greater than a threshold are recognized as defects and removed from the array data. The index is found from the differences between the target and an ideal standard of hybridization obtained as a trimmed mean among experiments, representing the statistical center of differences in each section. The threshold is derived as a screening level designated by the operator, but has only limited effect on the effectiveness of data cancellation. The validity of the algorithm and the effects of data cancellation are tested using GeneChip data obtained from a series of experiments. The algorithm is demonstrated to greatly improve the reproducibility of measurements, and removes only a small number of faultless data.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms*
  • Arabidopsis / genetics
  • Data Interpretation, Statistical
  • False Positive Reactions
  • Gene Expression Profiling*
  • Humans
  • Nucleic Acid Hybridization*
  • Oligonucleotide Array Sequence Analysis / methods
  • Oligonucleotide Array Sequence Analysis / standards*
  • Reproducibility of Results
  • Sensitivity and Specificity