First-line drug resistance profiling of Mycobacterium tuberculosis: a machine learning approach

AMIA Annu Symp Proc. 2022 Feb 21:2021:891-899. eCollection 2021.

Abstract

The persistence and emergence of new multi-drug resistant Mycobacterium tuberculosis (M. tb) strains continues to advance the devastating tuberculosis (TB) epidemic. Robust systems are needed to accurately and rapidly perform drug-resistance profiling, and machine learning (ML) methods combined with genomic sequence data may provide novel insights into drug-resistance mechanisms. Using 372 M. tb isolates, the combined utility of ML and bioinformatics to perform drug-resistance profiling is demonstrated. SNPs, InDels, and dinucleotide frequencies are explored as input features for three ML models, namely Decision Trees, Random Forest, and the eXtreme Gradient Boosted model. Using SNPs and InDels, all three models performed equally well yielding a 99% accuracy, 97% recall, and 99% F1-score. Using dinucleotide frequencies, the XGBoost algorithm was superior with a 97% accuracy, 94% recall and 97% F1-score. This study validates the use of variants and presents dinucleotide features as another effective feature encoding method for ML-based phenotype classification.

MeSH terms

  • Antitubercular Agents* / pharmacology
  • Antitubercular Agents* / therapeutic use
  • Drug Resistance, Multiple, Bacterial* / genetics
  • Humans
  • Machine Learning*
  • Mycobacterium tuberculosis* / drug effects
  • Mycobacterium tuberculosis* / genetics
  • Tuberculosis* / drug therapy

Substances

  • Antitubercular Agents