Quality and concordance of genotyping array data of 12,064 samples from 5840 cancer patients

Genomics. 2019 Jul;111(4):950-957. doi: 10.1016/j.ygeno.2018.06.001. Epub 2018 Jun 11.

Abstract

Genotyping arrays characterize genome-wide SNPs for a study cohort and were the primary technology behind genome wide association studies over the last decade. The Cancer Genome Atlas (TCGA) is one of the largest cancer consortium studies, and it collected genotyping data for all of its participants. Using TCGA SNP data genotyped using the Affymetrix 6.0 SNP array from 12,064 samples, we conducted a comprehensive comparisons across DNA sources (tumor tissue, normal tissue, and blood) and sample storage protocols (formalin-fixed paraffin-embedded (FFPE) vs. freshly frozen (FF)), examining genotypes, transition/transversion ratios, and mutation catalogues. During the analysis, we made important observations in relevance to the data quality issues. SNP concordance was excellent between blood and normal tissues, and slightly lower between blood and tumor tissue due to potential somatic mutations in the tumors. The observed poor SNP concordance between FFPE and FF samples suggested a batch effect. The transition/transversion ratio, a metric commonly used for quality control purpose in exome sequencing projects, appeared less applicable for genotyping array data due to the whole-genome coverage built into the array design. Moreover, there were substantially more loss of heterozygosity events than gain of heterozygosity when comparing tumors relative to normal tissues and blood. This might be a consequence of extensive copy number deletions in tumors. In summary, our thorough evaluation calls for more adequate quality control practices and provides guidelines for improved application of TCGA genotyping data.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Databases, Genetic / standards
  • Genotyping Techniques / methods*
  • Genotyping Techniques / standards
  • Humans
  • Neoplasms / genetics*
  • Polymorphism, Single Nucleotide
  • Tissue Array Analysis / methods*
  • Tissue Array Analysis / standards