An accurate assignment test for extremely low-coverage whole-genome sequence data

Giada Ferrari; Lane M Atmore; Sissel Jentoft; Kjetill S Jakobsen; Daniel Makowiecki; James H Barrett; Bastiaan Star

doi:10.1111/1755-0998.13551

An accurate assignment test for extremely low-coverage whole-genome sequence data

Mol Ecol Resour. 2022 May;22(4):1330-1344. doi: 10.1111/1755-0998.13551. Epub 2021 Nov 30.

Authors

Giada Ferrari¹, Lane M Atmore¹, Sissel Jentoft¹, Kjetill S Jakobsen¹, Daniel Makowiecki², James H Barrett^{3

4}, Bastiaan Star¹

Affiliations

¹ Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway.
² Department of Environmental Archaeology and Human Paleoecology, Institute of Archaeology, Nicolaus Copernicus University, Torun, Poland.
³ McDonald Institute for Archaeological Research, Department of Archaeology, University of Cambridge, Cambridge, UK.
⁴ Department of Archaeology and Cultural History, NTNU University Museum, Trondheim, Norway.

PMID: 34779123
DOI: 10.1111/1755-0998.13551

Abstract

Genomic assignment tests can provide important diagnostic biological characteristics, such as population of origin or ecotype. Yet, assignment tests often rely on moderate- to high-coverage sequence data that can be difficult to obtain for fields such as molecular ecology and ancient DNA. We have developed a novel approach that efficiently assigns biologically relevant information (i.e., population identity or structural variants such as inversions) in extremely low-coverage sequence data. First, we generate databases from existing reference data using a subset of diagnostic single nucleotide polymorphisms (SNPs) associated with a biological characteristic. Low-coverage alignment files are subsequently compared to these databases to ascertain allelic state, yielding a joint probability for each association. To assess the efficacy of this approach, we assigned haplotypes and population identity in Heliconius butterflies, Atlantic herring, and Atlantic cod using chromosomal inversion sites and whole-genome data. We scored both modern and ancient specimens, including the first whole-genome sequence data recovered from ancient Atlantic herring bones. The method accurately assigns biological characteristics, including population membership, using extremely low-coverage data (as low as 0.0001x) based on genome-wide SNPs. This approach will therefore increase the number of samples in evolutionary, ecological and archaeological research for which relevant biological information can be obtained.

Keywords: chromosomal inversion; ecotype; genome skimming; haplotype; population assignment.

MeSH terms

Animals
Butterflies* / genetics
Ecotype
Gadus morhua* / genetics
Genome / genetics
Haplotypes
Polymorphism, Single Nucleotide
Sequence Analysis, DNA / methods

Abstract

MeSH terms

Grants and funding