Machine learning based DNA melt curve profiling enables automated novel genotype detection

BMC Bioinformatics. 2024 May 10;25(1):185. doi: 10.1186/s12859-024-05747-0.

Abstract

Surveillance for genetic variation of microbial pathogens, both within and among species, plays an important role in informing research, diagnostic, prevention, and treatment activities for disease control. However, large-scale systematic screening for novel genotypes remains challenging in part due to technological limitations. Towards addressing this challenge, we present an advancement in universal microbial high resolution melting (HRM) analysis that is capable of accomplishing both known genotype identification and novel genotype detection. Specifically, this novel surveillance functionality is achieved through time-series modeling of sequence-defined HRM curves, which is uniquely enabled by the large-scale melt curve datasets generated using our high-throughput digital HRM platform. Taking the detection of bacterial genotypes as a model application, we demonstrate that our algorithms accomplish an overall classification accuracy over 99.7% and perform novelty detection with a sensitivity of 0.96, specificity of 0.96 and Youden index of 0.92. Since HRM-based DNA profiling is an inexpensive and rapid technique, our results add support for the feasibility of its use in surveillance applications.

Keywords: 16S rRNA; Emerging pathogens; Machine learning; Melt curve; Novelty detection; Pathogen identification.

MeSH terms

  • Algorithms
  • DNA, Bacterial / genetics
  • Genotype*
  • Machine Learning*
  • Nucleic Acid Denaturation / genetics

Substances

  • DNA, Bacterial