Use of SNP genotypes to identify carriers of harmful recessive mutations in cattle populations

BMC Genomics. 2016 Nov 3;17(1):857. doi: 10.1186/s12864-016-3218-9.

Abstract

Background: SNP (single nucleotide polymorphisms) genotype data are increasingly available in cattle populations and, among other things, can be used to predict carriers of specific mutations. It is therefore convenient to have a practical statistical method for the accurate classification of individuals into carriers and non-carriers. In this paper, we compared - through cross-validation- five classification models (Lasso-penalized logistic regression -Lasso, Support Vector Machines with either linear or radial kernel -SVML and SVMR, k-nearest neighbors -KNN, and multi-allelic gene prediction -MAG), for the identification of carriers of the TUBD1 recessive mutation on BTA19 (Bos taurus autosome 19), known to be associated with high calf mortality. A population of 3116 Fleckvieh and 392 Brown Swiss animals genotyped with the 54K SNP-chip was available for the analysis.

Results: In general, the use of SNP genotypes proved to be very effective for the identification of mutation carriers. The best predictive models were Lasso, SVML and MAG, with an average error rate, respectively, of 0.2 %, 0.4 % and 0.6 % in Fleckvieh, and 1.2 %, 0.9 % and 1.7 % in Brown Swiss. For the three models, the false positive rate was, respectively, 0.1 %, 0.1 % and 0.2 % in Fleckvieh, and 3.0 %, 2.4 % and 1.6 % in Brown Swiss; the false negative rate was 4.4 %, 7.6 %1.0 % in Fleckvieh, and 0.0 %, 0.1% and 0.8 % in Brown Swiss. MAG appeared to be more robust to sample size reduction: with 25 % of the data, the average error rate was 0.7 % and 2.2 % in Fleckvieh and Brown Swiss, compared to 2.1 % and 5.5 % with Lasso, and 2.6 % and 12.0 % with SVML.

Conclusions: The use of SNP genotypes is a very effective and efficient technique for the identification of mutation carriers in cattle populations. Very few misclassifications were observed, overall and both in the carriers and non-carriers classes. This indicates that this is a very reliable approach for potential applications in cattle breeding.

Keywords: Carrier identification; Cattle; Haplotypes; KNN; Lasso-penalised logistic regression; MAG; Recessive mutations; SNP genotypes; support vector machines.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Cattle
  • Female
  • Genes, Recessive*
  • Genetic Carrier Screening
  • Genotype*
  • Heterozygote*
  • Male
  • Mutation*
  • Polymorphism, Single Nucleotide*
  • Reproducibility of Results
  • Support Vector Machine