High confidence copy number variants identified in Holstein dairy cattle from whole genome sequence and genotype array data

Adrien M Butty; Tatiane C S Chud; Filippo Miglior; Flavio S Schenkel; Arun Kommadath; Kirill Krivushin; Jason R Grant; Irene M Häfliger; Cord Drögemüller; Angela Cánovas; Paul Stothard; Christine F Baes

doi:10.1038/s41598-020-64680-3

High confidence copy number variants identified in Holstein dairy cattle from whole genome sequence and genotype array data

Sci Rep. 2020 May 15;10(1):8044. doi: 10.1038/s41598-020-64680-3.

Authors

Affiliations

¹ Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada.
² Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada.
³ Lacombe Research and Development Centre, Agriculture and Agri-Food Canada, Lacombe, AB, Canada.
⁴ Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, BE, Switzerland.
⁵ Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada. cbaes@uoguelph.ca.
⁶ Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, BE, Switzerland. cbaes@uoguelph.ca.

Abstract

Multiple methods to detect copy number variants (CNV) relying on different types of data have been developed and CNV have been shown to have an impact on phenotypes of numerous traits of economic importance in cattle, such as reproduction and immunity. Further improvements in CNV detection are still needed in regard to the trade-off between high-true and low-false positive variant identification rates. Instead of improving single CNV detection methods, variants can be identified in silico with high confidence when multiple methods and datasets are combined. Here, CNV were identified from whole-genome sequences (WGS) and genotype array (GEN) data on 96 Holstein animals. After CNV detection, two sets of high confidence CNV regions (CNVR) were created that contained variants found in both WGS and GEN data following an animal-based (n = 52) and a population-based (n = 36) pipeline. Furthermore, the change in false positive CNV identification rates using different GEN marker densities was evaluated. The population-based approach characterized CNVR, which were more often shared among animals (average 40% more samples per CNVR) and were more often linked to putative functions (48 vs 56% of CNVR) than CNV identified with the animal-based approach. Moreover, false positive identification rates up to 22% were estimated on GEN information. Further research using larger datasets should use a population-wide approach to identify high confidence CNVR.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Animals
Breeding
Cattle
Chromosome Mapping
Computational Biology / methods
DNA Copy Number Variations*
Genetic Markers
Genome*
Genomics / methods
Genotype*
Whole Genome Sequencing*

Substances

Genetic Markers