Integrative analysis of clinical health records, imaging and pathogen genomics identifies personalized predictors of disease prognosis in tuberculosis

medRxiv [Preprint]. 2022 Jul 21:2022.07.20.22277862. doi: 10.1101/2022.07.20.22277862.

Abstract

Tuberculosis (TB) afflicts over 10 million people every year and its global burden is projected to increase dramatically due to multidrug-resistant TB (MDR-TB). The Covid-19 pandemic has resulted in reduced access to TB diagnosis and treatment, reversing decades of progress in disease management globally. It is thus crucial to analyze real-world multi-domain information from patient health records to determine personalized predictors of TB treatment outcome and drug resistance. We conduct a retrospective analysis on electronic health records of 5060 TB patients spanning 10 countries with high burden of MDR-TB including Ukraine, Moldova, Belarus and India available on the NIAID-TB portals database. We analyze over 200 features across multiple host and pathogen modalities representing patient social demographics, disease presentations as seen in cChest X rays and CT scans, and genomic records with drug susceptibility features of the pathogen strain from each patient. Our machine learning model, built with diverse data modalities outperforms models built using each modality alone in predicting treatment outcomes, with an accuracy of 81% and AUC of 0.768. We determine robust predictors across countries that are associated with unsuccessful treatmentclinical outcomes, and validate our predictions on new patient data from TB Portals. Our analysis of drug regimens and drug interactions suggests that synergistic drug combinations and those containing the drugs Bedaquiline, Levofloxacin, Clofazimine and Amoxicillin see more success in treating MDR and XDR TB. Features identified via chest imaging such as percentage of abnormal volume, size of lung cavitation and bronchial obstruction are associated significantly with pathogen genomic attributes of drug resistance. Increased disease severity was also observed in patients with lower BMI and with comorbidities. Our integrated multi-modal analysis thus revealed significant associations between radiological, microbiological, therapeutic, and demographic data modalities, providing a deeper understanding of personalized responses to aid in the clinical management of TB.

Keywords: EHR data; TB; Tuberculosis; Ukraine; drug-resistance; machine learning; pandemic; personalized medicine.

Publication types

  • Preprint