Microhaplotypes provide increased power from short-read DNA sequences for relationship inference

Mol Ecol Resour. 2018 Mar;18(2):296-305. doi: 10.1111/1755-0998.12737. Epub 2017 Dec 15.

Abstract

The accelerating rate at which DNA sequence data are now generated by high-throughput sequencing instruments provides both opportunities and challenges for population genetic and ecological investigations of animals and plants. We show here how the common practice of calling genotypes from a single SNP per sequenced region ignores substantial additional information in the phased short-read sequences that are provided by these sequencing instruments. We target sequenced regions with multiple SNPs in kelp rockfish (Sebastes atrovirens) to determine "microhaplotypes" and then call these microhaplotypes as alleles at each locus. We then demonstrate how these multi-allelic marker data from such loci dramatically increase power for relationship inference. The microhaplotype approach decreases false-positive rates by several orders of magnitude, relative to calling bi-allelic SNPs, for two challenging analytical procedures, full-sibling and single parent-offspring pair identification. We also show how the identification of half-sibling pairs requires so much data that physical linkage becomes a consideration, and that most published studies that attempt to do so are dramatically underpowered. The advent of phased short-read DNA sequence data, in conjunction with emerging analytical tools for their analysis, promises to improve efficiency by reducing the number of loci necessary for a particular level of statistical confidence, thereby lowering the cost of data collection and reducing the degree of physical linkage amongst markers used for relationship estimation. Such advances will facilitate collaborative research and management for migratory and other widespread species.

Keywords: high-throughput DNA sequencing; microhaplotype; parentage; population genetics; relationship inference.

MeSH terms

  • Animals
  • Computational Biology / methods*
  • Fishes / classification*
  • Fishes / genetics*
  • Genetics, Population / methods*
  • Genotyping Techniques / methods*
  • Haplotypes*
  • Polymorphism, Single Nucleotide
  • Sequence Analysis, DNA / methods*