The impact of the rank of marker variance-covariance matrix in principal component evaluation for genomic selection applications

J Anim Breed Genet. 2011 Dec;128(6):440-5. doi: 10.1111/j.1439-0388.2011.00957.x. Epub 2011 Sep 12.

Abstract

In genomic selection (GS) programmes, direct genomic values (DGV) are evaluated using information provided by high-density SNP chip. Being DGV accuracy strictly dependent on SNP density, it is likely that an increase in the number of markers per chip will result in severe computational consequences. Aim of present work was to test the effectiveness of principal component analysis (PCA) carried out by chromosome in reducing the marker dimensionality for GS purposes. A simulated data set of 5700 individuals with an equal number of SNP distributed over six chromosomes was used. PCs were extracted both genome-wide (ALL) and separately by chromosome (CHR) and used to predict DGVs. In the ALL scenario, the SNP variance-covariance matrix (S) was singular, positive semi-definite and contained null information which introduces 'spuriousness' in the derived results. On the contrary, the S matrix for each chromosome (CHR scenario) had a full rank. Obtained DGV accuracies were always better for CHR than ALL. Moreover, in the latter scenario, DGV accuracies became soon unsettled as the number of animals decreases, whereas in CHR, they remain stable till 900-1000 individuals. In real applications where a 54k SNP chip is used, the largest number of markers per chromosome is approximately 2500. Thus, a number of around 3000 genotyped animals could lead to reliable results when the original SNP variables are replaced by a reduced number of PCs.

MeSH terms

  • Analysis of Variance
  • Animals
  • Breeding / methods*
  • Genetic Markers / genetics*
  • Genomics / methods*
  • Polymorphism, Single Nucleotide
  • Principal Component Analysis / methods*

Substances

  • Genetic Markers