An SGSGeneloss-Based Method for Constructing a Gene Presence-Absence Table Using Mosdepth

Methods Mol Biol. 2022:2512:73-80. doi: 10.1007/978-1-0716-2429-6_5.

Abstract

Presence-absence variants (PAV) are genomic regions present in some individuals of a species, but not others. PAVs have been shown to contribute to genomic diversity, especially in bacteria and plants. These structural variations have been linked to traits and can be used to track a species' evolutionary history. PAVs are usually called by aligning short read sequence data from one or more individuals to a reference genome or pangenome assembly, and then comparing coverage. Regions where reads do not align define absence in that individual, and the regions are classified as PAVs. The method below details how to align sequence reads to a reference and how to use the sequencing-coverage calculator Mosdepth to identify PAVs and construct a PAV table for use in downstream comparative genome analysis.

Keywords: Gene loss; Presence–absence variants; SGSGeneLoss; Single-nucleotide polymorphisms.

MeSH terms

  • Genome*
  • Genomics* / methods
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Sequence Analysis, DNA / methods