Functional genetic variation in pe/ ppe genes contributes to diversity in Mycobacterium tuberculosis lineages and potential interactions with the human host

Front Microbiol. 2023 Oct 9:14:1244319. doi: 10.3389/fmicb.2023.1244319. eCollection 2023.

Abstract

Introduction: Around 10% of the coding potential of Mycobacterium tuberculosisis constituted by two poorly understood gene families, the pe and ppe loci, thought to be involved in host-pathogen interactions. Their repetitive nature and high GC content have hindered sequence analysis, leading to exclusion from whole-genome studies. Understanding the genetic diversity of pe/ppe families is essential to facilitate their potential translation into tools for tuberculosis prevention and treatment.

Methods: To investigate the genetic diversity of the 169 pe/ppe genes, we performed a sequence analysis across 73 long-read assemblies representing seven different lineages of M. tuberculosis and M. bovis BCG. Individual pe/ppe gene alignments were extracted and diversity and conservation across the different lineages studied.

Results: The pe/ppe genes were classified into three groups based on the level of protein sequence conservation relative to H37Rv, finding that >50% were conserved, with indels in pe_pgrs and ppe_mptr sub-families being major drivers of structural variation. Gene rearrangements, such as duplications and gene fusions, were observed between pe and pe_pgrs genes. Inter-lineage diversity revealed lineage-specific SNPs and indels.

Discussion: The high level of pe/ppe genes conservation, together with the lineage-specific findings, suggest their phylogenetic informativeness. However, structural variants and gene rearrangements differing from the reference were also identified, with potential implications for pathogenicity. Overall, improving our knowledge of these complex gene families may have insights into pathogenicity and inform the development of much-needed tools for tuberculosis control.

Keywords: MTBC; Mycobacerium tuberculosis; diversity; genomics; pe/ppe family of genes.