Predicting Dog Phenotypes from Genotypes

Emily R Bartusiak; Miriam Barrabes; Aigerim Rymbekova; Julia Gimbernat-Mayol; Cayetana Lopez; Lorenzo Barberis; Daniel Mas Montserrat; Xavier Giro-I-Nieto; Alexander G Ioannidis

doi:10.1109/EMBC48229.2022.9870905

Predicting Dog Phenotypes from Genotypes

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul:2022:3558-3562. doi: 10.1109/EMBC48229.2022.9870905.

Authors

Emily R Bartusiak, Miriam Barrabes, Aigerim Rymbekova, Julia Gimbernat-Mayol, Cayetana Lopez, Lorenzo Barberis, Daniel Mas Montserrat, Xavier Giro-I-Nieto, Alexander G Ioannidis

PMID: 36085664
DOI: 10.1109/EMBC48229.2022.9870905

Abstract

We analyze dog genotypes (i.e., positions of dog DNA sequences that often vary between different dogs) in order to predict the corresponding phenotypes (i.e., unique observed characteristics). More specifically, given chromosome data from a dog, we aim to predict the breed, height, and weight. We explore a variety of linear and non-linear classification and regression techniques to accomplish these three tasks. We also investigate the use of a neural network (both in linear and non-linear modes) for breed classification and compare the performance to traditional statistical methods. We show that linear methods generally outperform or match the performance of non-linear methods for breed classification. However, we show that the reverse is true for height and weight regression. Finally, we evaluate the results of all of these methods based on the number of input features used in the analysis. We conduct experiments using different fractions of the full genomic sequences, resulting in input sequences ranging from 20 SNPs to ∼200k SNPs. In doing so, we explore the impact of using a very limited number of SNPs for prediction. Our experiments demonstrate that these phenotypes in dogs can be predicted with as few as 0.5% of randomly selected SNPs (i.e., 992 SNPs) and that dog breeds can be classified with 50% balanced accuracy with as few as 0.02% SNPs (i.e., 40 SNPs).

MeSH terms

Animals
Dogs
Genomics*
Genotype
Neural Networks, Computer
Phenotype
Polymorphism, Single Nucleotide*