Genotyping Polyploids from Messy Sequencing Data

David Gerard; Luis Felipe Ventorim Ferrão; Antonio Augusto Franco Garcia; Matthew Stephens

doi:10.1534/genetics.118.301468

Genotyping Polyploids from Messy Sequencing Data

Genetics. 2018 Nov;210(3):789-807. doi: 10.1534/genetics.118.301468. Epub 2018 Sep 5.

Authors

David Gerard¹, Luis Felipe Ventorim Ferrão², Antonio Augusto Franco Garcia³, Matthew Stephens^{4

5}

Affiliations

¹ Department of Mathematics and Statistics, American University, Washington, DC 20016 dgerard@american.edu.
² Horticultural Sciences Department, University of Florida, Gainesville, Florida 32611.
³ Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, 13418-900, Brazil.
⁴ Department of Human Genetics, University of Chicago, Illinois 60637.
⁵ Department of Statistics, University of Chicago, Illinois 60637.

Abstract

Detecting and quantifying the differences in individual genomes (i.e., genotyping), plays a fundamental role in most modern bioinformatics pipelines. Many scientists now use reduced representation next-generation sequencing (NGS) approaches for genotyping. Genotyping diploid individuals using NGS is a well-studied field, and similar methods for polyploid individuals are just emerging. However, there are many aspects of NGS data, particularly in polyploids, that remain unexplored by most methods. Our contributions in this paper are fourfold: (i) We draw attention to, and then model, common aspects of NGS data: sequencing error, allelic bias, overdispersion, and outlying observations. (ii) Many datasets feature related individuals, and so we use the structure of Mendelian segregation to build an empirical Bayes approach for genotyping polyploid individuals. (iii) We develop novel models to account for preferential pairing of chromosomes, and harness these for genotyping. (iv) We derive oracle genotyping error rates that may be used for read depth suggestions. We assess the accuracy of our method in simulations, and apply it to a dataset of hexaploid sweet potato (Ipomoea batatas). An R package implementing our method is available at https://cran.r-project.org/package=updog.

Keywords: GBS; RAD-Seq; hierarchical modeling; read-mapping bias; sequencing.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Alleles
Genotyping Techniques / methods*
High-Throughput Nucleotide Sequencing*
Ipomoea batatas / genetics
Models, Genetic
Polymorphism, Single Nucleotide / genetics
Polyploidy*

Associated data

figshare/10.25386/genetics.7019456

Abstract

Publication types

MeSH terms

Associated data

Grants and funding