Algorithms for large-scale genotyping microarrays

Bioinformatics. 2003 Dec 12;19(18):2397-403. doi: 10.1093/bioinformatics/btg332.

Abstract

Motivation: Analysis of many thousands of single nucleotide polymorphisms (SNPs) across whole genome is crucial to efficiently map disease genes and understanding susceptibility to diseases, drug efficacy and side effects for different populations and individuals. High density oligonucleotide microarrays provide the possibility for such analysis with reasonable cost. Such analysis requires accurate, reliable methods for feature extraction, classification, statistical modeling and filtering.

Results: We propose the modified partitioning around medoids as a classification method for relative allele signals. We use the average silhouette width, separation and other quantities as quality measures for genotyping classification. We form robust statistical models based on the classification results and use these models to make genotype calls and calculate quality measures of calls. We apply our algorithms to several different genotyping microarrays. We use reference types, informative Mendelian relationship in families, and leave-one-out cross validation to verify our results. The concordance rates with the single base extension reference types are 99.36% for the SNPs on autosomes and 99.64% for the SNPs on sex chromosomes. The concordance of the leave-one-out test is over 99.5% and is 99.9% higher for AA, AB and BB cells. We also provide a method to determine the gender of a sample based on the heterozygous call rate of SNPs on the X chromosome. See http://www.affymetrix.com for further information. The microarray data will also be available from the Affymetrix web site.

Availability: The algorithms will be available commercially in the Affymetrix software package.

Publication types

  • Comparative Study
  • Evaluation Study
  • Validation Study

MeSH terms

  • Algorithms*
  • Chromosome Mapping / methods
  • Chromosomes, Human, X / genetics
  • Gene Expression Profiling / methods*
  • Gene Frequency
  • Genotype
  • Oligonucleotide Array Sequence Analysis / methods*
  • Polymorphism, Single Nucleotide / genetics*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*