Pangenome Analysis of Plant Transcripts and Coding Sequences

Methods Mol Biol. 2022:2512:121-152. doi: 10.1007/978-1-0716-2429-6_9.

Abstract

The pangenome of a species is the sum of the genomes of its individuals. As coding sequences often represent only a small fraction of each genome, analyzing the pangene set can be a cost-effective strategy for plants with large genomes or highly heterozygous species. Here, we describe a step-by-step protocol to analyze plant pangene sets with the software GET_HOMOLOGUES-EST . After a short introduction, where the main concepts are illustrated, the remaining sections cover the installation and typical operations required to analyze and annotate pantranscriptomes and gene sets of plants. The recipes include instructions on how to call core and accessory genes, how to compute a presence-absence pangenome matrix, and how to identify and analyze private genes, present only in some genotypes. Downstream phylogenetic analyses are also discussed.

Keywords: Crops; Model plants; Pangene set; Pangenome; Polyploids; Scripting; Wild plants.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Phylogeny
  • Software*