A generalizable machine learning framework for classifying DNA repair defects using ctDNA exomes

NPJ Precis Oncol. 2023 Mar 13;7(1):27. doi: 10.1038/s41698-023-00366-z.

Abstract

Specific classes of DNA damage repair (DDR) defect can drive sensitivity to emerging therapies for metastatic prostate cancer. However, biomarker approaches based on DDR gene sequencing do not accurately predict DDR deficiency or treatment benefit. Somatic alteration signatures may identify DDR deficiency but historically require whole-genome sequencing of tumour tissue. We assembled whole-exome sequencing data for 155 high ctDNA fraction plasma cell-free DNA and matched leukocyte DNA samples from patients with metastatic prostate or bladder cancer. Labels for DDR gene alterations were established using deep targeted sequencing. Per sample mutation and copy number features were used to train XGBoost ensemble models. Naive somatic features and trinucleotide signatures were associated with specific DDR gene alterations but insufficient to resolve each class. Conversely, XGBoost-derived models showed strong performance including an area under the curve of 0.99, 0.99 and 1.00 for identifying BRCA2, CDK12, and mismatch repair deficiency in metastatic prostate cancer. Our machine learning approach re-classified several samples exhibiting genomic features inconsistent with original labels, identified a metastatic bladder cancer sample with a homozygous BRCA2 copy loss, and outperformed an existing exome-based classifier for BRCA2 deficiency. We present DARC Sign (DnA Repair Classification SIGNatures); a public machine learning tool leveraging clinically-practical liquid biopsy specimens for simultaneously identifying multiple types of metastatic prostate cancer DDR deficiencies. We posit that it will be useful for understanding differential responses to DDR-directed therapies in ongoing clinical trials and may ultimately enable prospective identification of prostate cancers with phenotypic evidence of DDR deficiency.