ScalpelSig Designs Targeted Genomic Panels from Data to Detect Activity of Mutational Signatures

J Comput Biol. 2022 Jan;29(1):56-73. doi: 10.1089/cmb.2021.0453. Epub 2022 Jan 5.

Abstract

Over the past decade, a promising line of cancer research has utilized machine learning to mine statistical patterns of mutations in cancer genomes for information. Recent work shows that these statistical patterns, commonly referred to as "mutational signatures," have diverse therapeutic potential as biomarkers for cancer therapies. However, translating this potential into reality is hindered by limited access to sequencing in the clinic. Almost all methods for mutational signature analysis (MSA) rely on whole genome or whole exome sequencing data, while sequencing in the clinic is typically limited to small gene panels. To improve clinical access to MSA, we considered the question of whether targeted panels could be designed for the purpose of mutational signature detection. Here we present ScalpelSig, to our knowledge the first algorithm that automatically designs genomic panels optimized for detection of a given mutational signature. The algorithm learns from data to identify genome regions that are particularly indicative of signature activity. Using a cohort of breast cancer genomes as training data, we show that ScalpelSig panels substantially improve accuracy of signature detection compared to baselines. We find that some ScalpelSig panels even approach the performance of whole exome sequencing, which observes over 10 × as much genomic material. We test our algorithm under a variety of conditions, showing that its performance generalizes to another dataset of breast cancers, to smaller panel sizes, and to lesser amounts of training data.

Keywords: cancer genomics; combinatorial optimization; mutational signatures.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Intramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Breast Neoplasms / genetics
  • Cohort Studies
  • Computational Biology
  • DNA Mutational Analysis / statistics & numerical data*
  • Databases, Genetic / statistics & numerical data
  • Female
  • Genomics / statistics & numerical data*
  • Humans
  • Machine Learning
  • Mutation
  • Whole Genome Sequencing / statistics & numerical data