Comparison of Whole Genome (wg-) and Core Genome (cg-) MLST (BioNumericsTM) Versus SNP Variant Calling for Epidemiological Investigation of Pseudomonas aeruginosa

Front Microbiol. 2020 Jul 22:11:1729. doi: 10.3389/fmicb.2020.01729. eCollection 2020.

Abstract

Whole genome sequencing (WGS) is increasingly used for epidemiological investigations of pathogens. While SNP variant calling is currently considered as the most suitable method, the choice of a representative reference genome and the isolate dependency of results limit standardization and affect resolution in an unknown manner. Whole or core genome Multi Locus Sequence Typing (wg-, cg-MLST) represents an attractive alternative. Here, we assess the accuracy of wg- and cg-MLST by comparing results of four Pseudomonas aeruginosa datasets for which epidemiological and genomic data were previously described. Three datasets included 155 isolates from three different sequence types (ST) of P. aeruginosa collected in our ICUs over a 5-year period. The fourth dataset consisted of 10 isolates from an investigation of P. aeruginosa contaminated hand soap. All isolates were previously analyzed by a core SNP approach. In this study, wg- and cg-MLST were performed in BioNumericsTM using a scheme developed by Applied-Maths. Correlation between SNP calling and wg- or cg-MLST results were evaluated by calculating linear regressions and their coefficient of correlations (R 2) between the number of SNPs and the number of allele differences in pairwise comparison of isolates. The number of SNPs and allele difference between isolates with close epidemiological linkage varies between 0-26 and 0-13, respectively. When compared to core-SNP calling, a higher coefficient of correlation was obtained with cgMLST (R 2 of 0.92-0.99) than with wgMLST (0.78-0.99). In one dataset, a putative homologous recombination of a large DNA fragment (202 loci) was identified among these isolates, affecting its phylogeny, but with no impact on the epidemiological analysis of outbreak isolates. In conclusion, we showed that the P. aeruginosa wgMLST scheme in BioNumericsTM is as discriminatory as the core-SNP calling approach and apparently useful for outbreak investigations. We also showed that epidemiological linked isolates showed less than 26 SNPs or 13 allele differences. These are important figures for the distinction between outbreak and non-outbreak isolates when interpreting WGS results. However, as P. aeruginosa is highly recombinant, a cgMLST approach is preferable and caution should be addressed to possible recombination of large DNA fragments.

Keywords: Pseudomonas aeruginosa; cgMLST; evaluation; molecular typing; wgMLST; whole genome sequencing.