High efficiency error suppression for accurate detection of low-frequency variants

Nucleic Acids Res. 2019 Sep 5;47(15):e87. doi: 10.1093/nar/gkz474.

Abstract

Detection of cancer-associated somatic mutations has broad applications for oncology and precision medicine. However, this becomes challenging when cancer-derived DNA is in low abundance, such as in impure tissue specimens or in circulating cell-free DNA. Next-generation sequencing (NGS) is particularly prone to technical artefacts that can limit the accuracy for calling low-allele-frequency mutations. State-of-the-art methods to improve detection of low-frequency mutations often employ unique molecular identifiers (UMIs) for error suppression; however, these methods are highly inefficient as they depend on redundant sequencing to assemble consensus sequences. Here, we present a novel strategy to enhance the efficiency of UMI-based error suppression by retaining single reads (singletons) that can participate in consensus assembly. This 'Singleton Correction' methodology outperformed other UMI-based strategies in efficiency, leading to greater sensitivity with high specificity in a cell line dilution series. Significant benefits were seen with Singleton Correction at sequencing depths ≤16 000×. We validated the utility and generalizability of this approach in a cohort of >300 individuals whose peripheral blood DNA was subjected to hybrid capture sequencing at ∼5000× depth. Singleton Correction can be incorporated into existing UMI-based error suppression workflows to boost mutation detection accuracy, thus improving the cost-effectiveness and clinical impact of NGS.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Cell Line, Tumor
  • DNA Barcoding, Taxonomic / methods*
  • Fetal Blood / cytology
  • Fetal Blood / metabolism
  • Gene Frequency
  • HCT116 Cells
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Leukemia, Myeloid, Acute / genetics*
  • Leukemia, Myeloid, Acute / pathology
  • Leukocytes, Mononuclear / metabolism
  • Leukocytes, Mononuclear / pathology
  • Mutation*
  • Neoplasm Proteins / genetics*
  • Precision Medicine / methods
  • Scientific Experimental Error
  • Sequence Analysis, DNA / methods*

Substances

  • Neoplasm Proteins