PhyDOSE: Design of follow-up single-cell sequencing experiments of tumors

PLoS Comput Biol. 2020 Oct 1;16(10):e1008240. doi: 10.1371/journal.pcbi.1008240. eCollection 2020 Oct.

Abstract

The combination of bulk and single-cell DNA sequencing data of the same tumor enables the inference of high-fidelity phylogenies that form the input to many important downstream analyses in cancer genomics. While many studies simultaneously perform bulk and single-cell sequencing, some studies have analyzed initial bulk data to identify which mutations to target in a follow-up single-cell sequencing experiment, thereby decreasing cost. Bulk data provide an additional untapped source of valuable information, composed of candidate phylogenies and associated clonal prevalence. Here, we introduce PhyDOSE, a method that uses this information to strategically optimize the design of follow-up single cell experiments. Underpinning our method is the observation that only a small number of clones uniquely distinguish one candidate tree from all other trees. We incorporate distinguishing features into a probabilistic model that infers the number of cells to sequence so as to confidently reconstruct the phylogeny of the tumor. We validate PhyDOSE using simulations and a retrospective analysis of a leukemia patient, concluding that PhyDOSE's computed number of cells resolves tree ambiguity even in the presence of typical single-cell sequencing errors. We also conduct a retrospective analysis on an acute myeloid leukemia cohort, demonstrating the potential to achieve similar results with a significant reduction in the number of cells sequenced. In a prospective analysis, we demonstrate the advantage of selecting cells to sequence across multiple biopsies and that only a small number of cells suffice to disambiguate the solution space of trees in a recent lung cancer cohort. In summary, PhyDOSE proposes cost-efficient single-cell sequencing experiments that yield high-fidelity phylogenies, which will improve downstream analyses aimed at deepening our understanding of cancer biology.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Evolution, Molecular
  • Genome / genetics
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Neoplasms / classification
  • Neoplasms / genetics*
  • Phylogeny
  • Retrospective Studies
  • Sequence Analysis, DNA
  • Single-Cell Analysis / methods*

Grants and funding

L.L.W., N.A., N.C. and M.E.K. were supported by UIUC Center for Computational Biotechnology and Genomic Medicine (grant: CSN 1624790). M.E.K. was supported by the National Science Foundation (grant: CCF 1850502). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.