A PCA-based method for ancestral informative markers selection in structured populations

Sci China C Life Sci. 2009 Oct;52(10):972-6. doi: 10.1007/s11427-009-0128-y. Epub 2009 Nov 13.

Abstract

Identification of population structure can help trace population histories and identify disease genes. Structured association (SA) is a commonly used approach for population structure identification and association mapping. A major issue with SA is that its performance greatly depends on the informativeness and the numbers of ancestral informative markers (AIMs). Present major AIM selection methods mostly require prior individual ancestry information, which is usually not available or uncertain in practice. To address this potential weakness, we herein develop a novel approach for AIM selection based on principle component analysis (PCA), which does not require prior ancestry information of study subjects. Our simulation and real genetic data analysis results suggest that, with equivalent AIMs, PCA-based selected AIMs can significantly increase the accuracy of inferred individual ancestries compared with traditionally randomly selected AIMs. Our method can easily be applied to whole genome data to select a set of highly informative AIMs in population structure, which can then be used to identify potential population structure and correct possible statistical biases caused by population stratification.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Genetic Markers / genetics*
  • Genetics, Population / methods*
  • Genotype
  • Humans
  • Models, Genetic
  • Polymorphism, Single Nucleotide
  • Principal Component Analysis*
  • Regression Analysis

Substances

  • Genetic Markers