Screening of Long Non-coding RNAs Biomarkers for the Diagnosis of Tuberculosis and Preliminary Construction of a Clinical Diagnosis Model

Front Microbiol. 2022 Mar 3:13:774663. doi: 10.3389/fmicb.2022.774663. eCollection 2022.

Abstract

Background: Pathogenic testing for tuberculosis (TB) is not yet sufficient for early and differential clinical diagnosis; thus, we investigated the potential of screening long non-coding RNAs (lncRNAs) from human hosts and using machine learning (ML) algorithms combined with electronic health record (EHR) metrics to construct a diagnostic model.

Methods: A total of 2,759 subjects were included in this study, including 12 in the primary screening cohort [7 TB patients and 5 healthy controls (HCs)] and 2,747 in the selection cohort (798 TB patients, 299 patients with non-TB lung disease, and 1,650 HCs). An Affymetrix HTA2.0 array and qRT-PCR were applied to screen new specific lncRNA markers for TB in individual nucleated cells from host peripheral blood. A ML algorithm was established to combine the patients' EHR information and lncRNA data via logistic regression models and nomogram visualization to differentiate PTB from suspected patients of the selection cohort.

Results: Two differentially expressed lncRNAs (TCONS_00001838 and n406498) were identified (p < 0.001) in the selection cohort. The optimal model was the "LncRNA + EHR" model, which included the above two lncRNAs and eight EHR parameters (age, hemoglobin, lymphocyte count, gamma interferon release test, weight loss, night sweats, polymorphic changes, and calcified foci on imaging). The best model was visualized by a nomogram and validated, and the accuracy of the "LncRNA + EHR" model was 0.79 (0.75-0.82), with a sensitivity of 0.81 (0.78-0.86), a specificity of 0.73 (0.64-0.79), and an area under the ROC curve (AUC) of 0.86. Furthermore, the nomogram showed good compliance in predicting the risk of TB and a higher net benefit than the "EHR" model for threshold probabilities of 0.2-1.

Conclusion: LncRNAs TCONS_00001838 and n406498 have the potential to become new molecular markers for PTB, and the nomogram of "LncRNA + EHR" model is expected to be effective for the early clinical diagnosis of TB.

Keywords: diagnostic models; long non-coding RNA; machine learning algorithms; molecular markers; tuberculosis.