Combining comparative genomic analysis with machine learning reveals some promising diagnostic markers to identify five common pathogenic non-tuberculous mycobacteria

Microb Biotechnol. 2021 Jul;14(4):1539-1549. doi: 10.1111/1751-7915.13815. Epub 2021 May 21.

Abstract

Non-tuberculous mycobacteria (NTM) can cause various respiratory diseases and even death in severe cases, and its incidence has increased rapidly worldwide. To date, it's difficult to use routine diagnostic methods and strain identification to precisely diagnose various types of NTM infections. We combined systematic comparative genomics with machine learning to select new diagnostic markers for precisely identifying five common pathogenic NTMs (Mycobacterium kansasii, Mycobacterium avium, Mycobacterium intracellular, Mycobacterium chelonae, Mycobacterium abscessus). A panel including six genes and two SNPs (nikA, benM, codA, pfkA2, mpr, yjcH, rrl C2638T, rrl A1173G) was selected to simultaneously identify the five NTMs with high accuracy (> 90%). Notably, the panel only containing the six genes also showed a good classification effect (accuracy > 90%). Additionally, the two panels could precisely differentiate the five NTMs from M. tuberculosis (accuracy > 99%). We also revealed some new marker genes/SNPs/combinations to accurately discriminate any one of the five NTMs separately, which provided the possibility to diagnose one certain NTM infection precisely. Our research not only reveals novel promising diagnostic markers to promote the development of precision diagnosis in NTM infectious, but also provides an insight into precisely identifying various genetically close pathogens through comparative genomics and machine learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genomics
  • Humans
  • Machine Learning
  • Mycobacterium Infections, Nontuberculous* / diagnosis
  • Nontuberculous Mycobacteria / classification*
  • Nontuberculous Mycobacteria / genetics