Selective sweep sites and SNP dense regions differentiate Mycobacterium bovis isolates across scales

Front Microbiol. 2022 Sep 7:13:787856. doi: 10.3389/fmicb.2022.787856. eCollection 2022.

Abstract

Mycobacterium bovis, a bacterial zoonotic pathogen responsible for the economically and agriculturally important livestock disease bovine tuberculosis (bTB), infects a broad mammalian host range worldwide. This characteristic has led to bidirectional transmission events between livestock and wildlife species as well as the formation of wildlife reservoirs, impacting the success of bTB control measures. Next Generation Sequencing (NGS) has transformed our ability to understand disease transmission events by tracking variant sites, however the genomic signatures related to host adaptation following spillover, alongside the role of other genomic factors in the M. bovis transmission process are understudied problems. We analyzed publicly available M. bovis datasets collected from 700 hosts across three countries with bTB endemic regions (United Kingdom, United States, and New Zealand) to investigate if genomic regions with high SNP density and/or selective sweep sites play a role in Mycobacterium bovis adaptation to new environments (e.g., at the host-species, geographical, and/or sub-population levels). A simulated M. bovis alignment was created to generate null distributions for defining genomic regions with high SNP counts and regions with selective sweeps evidence. Random Forest (RF) models were used to investigate evolutionary metrics within the genomic regions of interest to determine which genomic processes were the best for classifying M. bovis across ecological scales. We identified in the M. bovis genomes 14 and 132 high SNP density and selective sweep regions, respectively. Selective sweep regions were ranked as the most important in classifying M. bovis across the different scales in all RF models. SNP dense regions were found to have high importance in the badger and cattle specific RF models in classifying badger derived isolates from livestock derived ones. Additionally, the genes detected within these genomic regions harbor various pathogenic functions such as virulence and immunogenicity, membrane structure, host survival, and mycobactin production. The results of this study demonstrate how comparative genomics alongside machine learning approaches are useful to investigate further the nature of M. bovis host-pathogen interactions.

Keywords: Mycobacterium bovis; SNP dense regions; comparative genomics; ecological scales; geographic location; host range; population clusters; selective sweeps.