A roadmap to robust discriminant analysis of principal components

Mol Ecol Resour. 2023 Apr;23(3):519-522. doi: 10.1111/1755-0998.13724. Epub 2022 Nov 6.

Abstract

Identification of population structure is a common goal for a variety of applications, including conservation, wildlife management, and medical genetics. The outcome of these analyses can have far reaching implications; therefore, it is important to ensure an understanding of the strengths and weaknesses of the methodologies used. Increasing in popularity, the discriminant analysis of principal components (DAPC) method incorporates combinations of genetic variables (alleles) into a model that differentiates individuals into genetic clusters. However, users may not have a full understanding of how to best parameterize the model. In this issue of Thia (Molecular Ecology Resources, 2022) looks under the hood of the DAPC. Using simulated data, he demonstrates the importance of careful parameter selection in developing a DAPC model, what the implications are for over-fitting the model, and finally, how best to evaluate the results of DAPC models. This work highlights the issues that can arise when over-parameterizing the DAPC model when gene flow is high among clusters and provides important guidelines to ensure researchers are making conclusions that are biologically relevant.

MeSH terms

  • Alleles
  • Animals
  • Animals, Wild*
  • Discriminant Analysis
  • Gene Flow*
  • Humans