Genetic and nongenetic variation revealed for the principal components of human gene expression

Genetics. 2013 Nov;195(3):1117-28. doi: 10.1534/genetics.113.153221. Epub 2013 Sep 11.

Abstract

Principal components analysis has been employed in gene expression studies to correct for population substructure and batch and environmental effects. This method typically involves the removal of variation contained in as many as 50 principal components (PCs), which can constitute a large proportion of total variation present in the data. Each PC, however, can detect many sources of variation, including gene expression networks and genetic variation influencing transcript levels. We demonstrate that PCs generated from gene expression data can simultaneously contain both genetic and nongenetic factors. From heritability estimates we show that all PCs contain a considerable portion of genetic variation while nongenetic artifacts such as batch effects were associated to varying degrees with the first 60 PCs. These PCs demonstrate an enrichment of biological pathways, including core immune function and metabolic pathways. The use of PC correction in two independent data sets resulted in a reduction in the number of cis- and trans-expression QTL detected. Comparisons of PC and linear model correction revealed that PC correction was not as efficient at removing known batch effects and had a higher penalty on genetic variation. Therefore, this study highlights the danger of eliminating biologically relevant data when employing PC correction in gene expression data.

Keywords: batch effects; gene expression; heritability; linear models; normalization; principal components analysis.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Female
  • Gene Expression Profiling / statistics & numerical data*
  • Genetic Variation*
  • Genome-Wide Association Study / statistics & numerical data
  • Humans
  • Linear Models
  • Male
  • Models, Genetic
  • Principal Component Analysis
  • Quantitative Trait Loci
  • Quantitative Trait, Heritable