eDNAssay: A machine learning tool that accurately predicts qPCR cross-amplification

Mol Ecol Resour. 2022 Nov;22(8):2994-3005. doi: 10.1111/1755-0998.13681. Epub 2022 Jul 12.

Abstract

Environmental DNA (eDNA) sampling is a highly sensitive and cost-effective technique for wildlife monitoring, notably through the use of qPCR assays. However, it can be difficult to ensure assay specificity when many closely related species co-occur. In theory, specificity may be assessed in silico by determining whether assay oligonucleotides have enough base-pair mismatches with nontarget sequences to preclude amplification. However, the mismatch qualities required are poorly understood, making in silico assessments difficult and often necessitating extensive in vitro testing-typically the greatest bottleneck in assay development. Increasing the accuracy of in silico assessments would therefore streamline the assay development process. In this study, we paired 10 qPCR assays with 82 synthetic gene fragments for 530 specificity tests using SYBR Green intercalating dye (n = 262) and TaqMan hydrolysis probes (n = 268). Test results were used to train random forest classifiers to predict amplification. The primer-only model (SYBR Green results) and full-assay model (TaqMan probe-based results) were 99.6% and 100% accurate, respectively, in cross-validation. We further assessed model performance using six independent assays not used in model training. In these tests the primer-only model was 92.4% accurate (n = 119) and the full-assay model was 96.5% accurate (n = 144). The high performance achieved by these models makes it possible for eDNA practitioners to more quickly and confidently develop assays specific to the intended target. Practitioners can access the full-assay model online via eDNAssay (https://NationalGenomicsCenter.shinyapps.io/eDNAssay), a user-friendly tool for predicting qPCR cross-amplification.

Keywords: assay; base-pair mismatches; eDNA; environmental DNA; random forest; specificity.

MeSH terms

  • Benzothiazoles
  • DNA, Environmental*
  • Diamines
  • Machine Learning
  • Oligonucleotides
  • Quinolines
  • Real-Time Polymerase Chain Reaction / methods
  • Sensitivity and Specificity

Substances

  • Benzothiazoles
  • DNA, Environmental
  • Diamines
  • Oligonucleotides
  • Quinolines
  • SYBR Green I