Drug resistance prediction and resistance genes identification in Mycobacterium tuberculosis based on a hierarchical attentive neural network utilizing genome-wide variants

Brief Bioinform. 2022 May 13;23(3):bbac041. doi: 10.1093/bib/bbac041.

Abstract

Prediction of antimicrobial resistance based on whole-genome sequencing data has attracted greater attention due to its rapidity and convenience. Numerous machine learning-based studies have used genetic variants to predict drug resistance in Mycobacterium tuberculosis (MTB), assuming that variants are homogeneous, and most of these studies, however, have ignored the essential correlation between variants and corresponding genes when encoding variants, and used a limited number of variants as prediction input. In this study, taking advantage of genome-wide variants for drug-resistance prediction and inspired by natural language processing, we summarize drug resistance prediction into document classification, in which variants are considered as words, mutated genes in an isolate as sentences, and an isolate as a document. We propose a novel hierarchical attentive neural network model (HANN) that helps discover drug resistance-related genes and variants and acquire more interpretable biological results. It captures the interaction among variants in a mutated gene as well as among mutated genes in an isolate. Our results show that for the four first-line drugs of isoniazid (INH), rifampicin (RIF), ethambutol (EMB) and pyrazinamide (PZA), the HANN achieves the optimal area under the ROC curve of 97.90, 99.05, 96.44 and 95.14% and the optimal sensitivity of 94.63, 96.31, 92.56 and 87.05%, respectively. In addition, without any domain knowledge, the model identifies drug resistance-related genes and variants consistent with those confirmed by previous studies, and more importantly, it discovers one more potential drug-resistance-related gene.

Keywords: Mycobacterium tuberculosis; deep learning; drug resistance prediction; hierarchical attention; natural language processing; phenotype prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Antitubercular Agents / pharmacology
  • Antitubercular Agents / therapeutic use
  • Drug Resistance
  • Microbial Sensitivity Tests
  • Mutation
  • Mycobacterium tuberculosis*
  • Neural Networks, Computer

Substances

  • Antitubercular Agents