Use of pan-genome analysis for the identification of lineage-specific genes of Helicobacter pylori

FEMS Microbiol Lett. 2017 Jan;364(2):fnw296. doi: 10.1093/femsle/fnw296. Epub 2016 Dec 22.

Abstract

The human bacterial pathogen Helicobacter pylori has a highly variable genome, with significant allelic and sequence diversity between isolates and even within well-characterised strains, hampering comparative genomics of H. pylori In this study, pan-genome analysis has been used to identify lineage-specific genes of H. pylori A total of 346 H. pylori genomes spanning the hpAfrica1, hpAfrica2, hpAsia2, hpEurope, hspAmerind and hspEAsia multilocus sequence typing (MLST) lineages were searched for genes specifically overrepresented or underrepresented in MLST lineages or associated with the cag pathogenicity island. The only genes overrepresented in cag-positive genomes were the cag pathogenicity island genes themselves. In contrast, a total of 125 genes were either overrepresented or underrepresented in one or more MLST lineages. Of these 125 genes, alcohol/aldehyde-reducing enzymes linked with acid resistance and production of toxic aldehydes were found to be overrepresented in African lineages. Conversely, the FecA2 ferric citrate receptor was missing from hspAmerind genomes, but present in all other lineages. This work shows the applicability of pan-genome analysis for identification of lineage-specific genes of H. pylori, facilitating further investigation to allow linkage of differential distribution of genes with disease outcome or virulence of H. pylori.

Keywords: Helicobacter pylori; comparative genomics; genome sequences; multilocus sequence typing; pan-genome analysis.

Publication types

  • Comparative Study

MeSH terms

  • Genes, Bacterial*
  • Genetic Variation*
  • Genome, Bacterial*
  • Genomic Islands
  • Helicobacter Infections / microbiology
  • Helicobacter pylori / classification
  • Helicobacter pylori / genetics*
  • Helicobacter pylori / isolation & purification
  • Humans
  • Metabolic Networks and Pathways / genetics
  • Multilocus Sequence Typing
  • Virulence Factors / genetics

Substances

  • Virulence Factors