BPGA- an ultra-fast pan-genome analysis pipeline

Narendrakumar M Chaudhari; Vinod Kumar Gupta; Chitra Dutta

doi:10.1038/srep24373

BPGA- an ultra-fast pan-genome analysis pipeline

Sci Rep. 2016 Apr 13:6:24373. doi: 10.1038/srep24373.

Authors

Narendrakumar M Chaudhari¹, Vinod Kumar Gupta¹, Chitra Dutta¹

Affiliation

¹ Structural Biology &Bioinformatics Division, CSIR- Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India.

Abstract

Recent advances in ultra-high-throughput sequencing technology and metagenomics have led to a paradigm shift in microbial genomics from few genome comparisons to large-scale pan-genome studies at different scales of phylogenetic resolution. Pan-genome studies provide a framework for estimating the genomic diversity of the dataset, determining core (conserved), accessory (dispensable) and unique (strain-specific) gene pool of a species, tracing horizontal gene-flux across strains and providing insight into species evolution. The existing pan genome software tools suffer from various limitations like limited datasets, difficult installation/requirements, inadequate functional features etc. Here we present an ultra-fast computational pipeline BPGA (Bacterial Pan Genome Analysis tool) with seven functional modules. In addition to the routine pan genome analyses, BPGA introduces a number of novel features for downstream analyses like core/pan/MLST (Multi Locus Sequence Typing) phylogeny, exclusive presence/absence of genes in specific strains, subset analysis, atypical G + C content analysis and KEGG &COG mapping of core, accessory and unique genes. Other notable features include minimum running prerequisites, freedom to select the gene clustering method, ultra-fast execution, user friendly command line interface and high-quality graphics outputs. The performance of BPGA has been evaluated using a dataset of complete genome sequences of 28 Streptococcus pyogenes strains.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Genome, Bacterial*
High-Throughput Nucleotide Sequencing / methods*
Phylogeny
Software
Streptococcus pyogenes / classification
Streptococcus pyogenes / genetics