First Steps in the Analysis of Prokaryotic Pan-Genomes

Sávio Souza Costa; Luís Carlos Guimarães; Artur Silva; Siomar Castro Soares; Rafael Azevedo Baraúna

doi:10.1177/1177932220938064

First Steps in the Analysis of Prokaryotic Pan-Genomes

Bioinform Biol Insights. 2020 Aug 7:14:1177932220938064. doi: 10.1177/1177932220938064. eCollection 2020.

Authors

Sávio Souza Costa^{1

2}, Luís Carlos Guimarães¹, Artur Silva^{1

2}, Siomar Castro Soares³, Rafael Azevedo Baraúna^{1

2}

Affiliations

¹ Centro de Genômica e Biologia de Sistemas, Universidade Federal do Pará, Belém, Brazil.
² Laboratório de Engenharia Biológica, Espaço Inovação, Parque de Ciência e Tecnologia Guamá, Belém, Brazil.
³ Instituto de Ciências Biológicas e Naturais, Universidade Federal do Triângulo Mineiro, Uberaba, Brazil.

Abstract

Pan-genome is defined as the set of orthologous and unique genes of a specific group of organisms. The pan-genome is composed by the core genome, accessory genome, and species- or strain-specific genes. The pan-genome is considered open or closed based on the alpha value of the Heap law. In an open pan-genome, the number of gene families will continuously increase with the addition of new genomes to the analysis, while in a closed pan-genome, the number of gene families will not increase considerably. The first step of a pan-genome analysis is the homogenization of genome annotation. The same software should be used to annotate genomes, such as GeneMark or RAST. Subsequently, several software are used to calculate the pan-genome such as BPGA, GET_HOMOLOGUES, PGAP, among others. This review presents all these initial steps for those who want to perform a pan-genome analysis, explaining key concepts of the area. Furthermore, we present the pan-genomic analysis of 9 bacterial species. These are the species with the highest number of genomes deposited in GenBank. We also show the influence of the identity and coverage parameters on the prediction of orthologous and paralogous genes. Finally, we cite the perspectives of several research areas where pan-genome analysis can be used to answer important issues.

Keywords: Pan-genome; accessory genome; core genome.

Publication types

Review