Pan-Cancer Detection and Typing by Mining Patterns in Large Genome-Wide Cell-Free DNA Sequencing Datasets

Clin Chem. 2022 Sep 1;68(9):1164-1176. doi: 10.1093/clinchem/hvac095.

Abstract

Background: Cell-free DNA (cfDNA) analysis holds great promise for non-invasive cancer screening, diagnosis, and monitoring. We hypothesized that mining the patterns of cfDNA shallow whole-genome sequencing datasets from patients with cancer could improve cancer detection.

Methods: By applying unsupervised clustering and supervised machine learning on large cfDNA shallow whole-genome sequencing datasets from healthy individuals (n = 367) and patients with different hematological (n = 238) and solid malignancies (n = 320), we identified cfDNA signatures that enabled cancer detection and typing.

Results: Unsupervised clustering revealed cancer type-specific sub-grouping. Classification using a supervised machine learning model yielded accuracies of 96% and 65% in discriminating hematological and solid malignancies from healthy controls, respectively. The accuracy of disease type prediction was 85% and 70% for the hematological and solid cancers, respectively. The potential utility of managing a specific cancer was demonstrated by classifying benign from invasive and borderline adnexal masses with an area under the curve of 0.87 and 0.74, respectively.

Conclusions: This approach provides a generic analytical strategy for non-invasive pan-cancer detection and cancer type prediction.

Keywords: cfDNA; ctDNA; hematological malignancies; liquid biopsy; machine learning; ovarian tumors; solid tumors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers, Tumor / genetics
  • Cell-Free Nucleic Acids*
  • Humans
  • Neoplasms* / diagnosis
  • Neoplasms* / genetics
  • Whole Genome Sequencing

Substances

  • Biomarkers, Tumor
  • Cell-Free Nucleic Acids