Assessing the performance of methods for copy number aberration detection from single-cell DNA sequencing data

PLoS Comput Biol. 2020 Jul 13;16(7):e1008012. doi: 10.1371/journal.pcbi.1008012. eCollection 2020 Jul.

Abstract

Single-cell DNA sequencing technologies are enabling the study of mutations and their evolutionary trajectories in cancer. Somatic copy number aberrations (CNAs) have been implicated in the development and progression of various types of cancer. A wide array of methods for CNA detection has been either developed specifically for or adapted to single-cell DNA sequencing data. Understanding the strengths and limitations that are unique to each of these methods is very important for obtaining accurate copy number profiles from single-cell DNA sequencing data. We benchmarked three widely used methods-Ginkgo, HMMcopy, and CopyNumber-on simulated as well as real datasets. To facilitate this, we developed a novel simulator of single-cell genome evolution in the presence of CNAs. Furthermore, to assess performance on empirical data where the ground truth is unknown, we introduce a phylogeny-based measure for identifying potentially erroneous inferences. While single-cell DNA sequencing is very promising for elucidating and understanding CNAs, our findings show that even the best existing method does not exceed 80% accuracy. New methods that significantly improve upon the accuracy of these three methods are needed. Furthermore, with the large datasets being generated, the methods must be computationally efficient.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Chromosome Aberrations
  • Computational Biology
  • Computer Simulation
  • DNA Copy Number Variations*
  • Gene Dosage
  • Genome, Human*
  • Humans
  • Mutation
  • Neoplasms / genetics
  • Ploidies
  • Poisson Distribution
  • ROC Curve
  • Reproducibility of Results
  • Sequence Analysis, DNA / methods*
  • Single-Cell Analysis / methods*
  • Software

Grants and funding

The study was supported by the National Science Foundation grant IIS-1812822 (L.N.). X.F.M. was supported in part by the Computational Cancer Biology Training Program (CPRIT Grant No. RP170593). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.