Pre-selection of most significant SNPS for the estimation of genomic breeding values

BMC Proc. 2009 Feb 23;3 Suppl 1(Suppl 1):S14. doi: 10.1186/1753-6561-3-s1-s14.

Abstract

The availability of a large amount of SNP markers throughout the genome of different livestock species offers the opportunity to estimate genomic breeding values (GEBVs). However, the estimation of many effects in a data set of limited size represent a severe statistical problem. A pre-selection of SNPS based on single regression may provide a reasonable compromise between accuracy of results, number of independent variables to be considered and computing requirements.A total of 595 and 618 SNPS were pre-selected using a simple linear regression for each SNP, based on phenotypes or polygenic EBVs, respectively, with an average distance of 9-10 cM between them. Chromosome four had the largest frequency of selected SNPS. Average correlations between GEBVs and TBVs were about 0.82 and 0.73 for the TRAINING generations when phenotypes or polygenic EBVs were considered as dependent variable, whereas they tend to decrease to 0.66 and 0.54 for the PREDICTION generations. The pre-selection of SNPs using the phenotypes as dependent variable together with a BLUP estimation of marker genotype effects using a variance contribution of each marker equal to sigma2a/nsnps resulted in a remarkable accuracy of GEBV estimation (0.77) in the PREDICTION generations.