Challenges in gene-oriented approaches for pangenome content discovery

Brief Bioinform. 2021 May 20;22(3):bbaa198. doi: 10.1093/bib/bbaa198.

Abstract

Given a group of genomes, represented as the sets of genes that belong to them, the discovery of the pangenomic content is based on the search of genetic homology among the genes for clustering them into families. Thus, pangenomic analyses investigate the membership of the families to the given genomes. This approach is referred to as the gene-oriented approach in contrast to other definitions of the problem that takes into account different genomic features. In the past years, several tools have been developed to discover and analyse pangenomic contents. Because of the hardness of the problem, each tool applies a different strategy for discovering the pangenomic content. This results in a differentiation of the performance of each tool that depends on the composition of the input genomes. This review reports the main analysis instruments provided by the current state of the art tools for the discovery of pangenomic contents. Moreover, unlike previous works, the presented study compares pangenomic tools from a methodological perspective, analysing the causes that lead a given methodology to outperform other tools. The analysis is performed by taking into account different bacterial populations, which are synthetically generated by changing evolutionary parameters. The benchmarks used to compare the pangenomic tools, in addition to the computational pipeline developed for this purpose, are available at https://github.com/InfOmics/pangenes-review. Contact: V. Bonnici, R. Giugno Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.

Keywords: computational comparison; gene homology; pangenome; sequence similarity; synthetic benchmark.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Algorithms*
  • Bacteria / classification
  • Bacteria / genetics
  • Biological Evolution
  • Computational Biology / methods*
  • Genome / genetics*
  • Genome, Bacterial / genetics*
  • Genomics / methods*
  • Mycoplasma / classification
  • Mycoplasma / genetics
  • Phylogeny
  • Software