The impact of the rank of marker variance-covariance matrix in principal component evaluation for genomic selection applications

C Dimauro; M Cellesi; M A Pintus; N P P Macciotta

doi:10.1111/j.1439-0388.2011.00957.x

The impact of the rank of marker variance-covariance matrix in principal component evaluation for genomic selection applications

J Anim Breed Genet. 2011 Dec;128(6):440-5. doi: 10.1111/j.1439-0388.2011.00957.x. Epub 2011 Sep 12.

Authors

C Dimauro¹, M Cellesi, M A Pintus, N P P Macciotta

Affiliation

¹ Dipartimento di Scienze Zootecniche, Università di Sassari, via De Nicola, Sassari, Italy. dimauro@uniss.it

PMID: 22059577
DOI: 10.1111/j.1439-0388.2011.00957.x

Abstract

In genomic selection (GS) programmes, direct genomic values (DGV) are evaluated using information provided by high-density SNP chip. Being DGV accuracy strictly dependent on SNP density, it is likely that an increase in the number of markers per chip will result in severe computational consequences. Aim of present work was to test the effectiveness of principal component analysis (PCA) carried out by chromosome in reducing the marker dimensionality for GS purposes. A simulated data set of 5700 individuals with an equal number of SNP distributed over six chromosomes was used. PCs were extracted both genome-wide (ALL) and separately by chromosome (CHR) and used to predict DGVs. In the ALL scenario, the SNP variance-covariance matrix (S) was singular, positive semi-definite and contained null information which introduces 'spuriousness' in the derived results. On the contrary, the S matrix for each chromosome (CHR scenario) had a full rank. Obtained DGV accuracies were always better for CHR than ALL. Moreover, in the latter scenario, DGV accuracies became soon unsettled as the number of animals decreases, whereas in CHR, they remain stable till 900-1000 individuals. In real applications where a 54k SNP chip is used, the largest number of markers per chromosome is approximately 2500. Thus, a number of around 3000 genotyped animals could lead to reliable results when the original SNP variables are replaced by a reduced number of PCs.

MeSH terms

Analysis of Variance
Animals
Breeding / methods*
Genetic Markers / genetics*
Genomics / methods*
Polymorphism, Single Nucleotide
Principal Component Analysis / methods*

Substances

Genetic Markers