Reliability of algorithmic somatic copy number alteration detection from targeted capture data

Bioinformatics. 2017 Sep 15;33(18):2791-2798. doi: 10.1093/bioinformatics/btx284.

Abstract

Motivation: Whole exome and gene panel sequencing are increasingly used for oncological diagnostics. To investigate the accuracy of SCNA detection algorithms on simulated and clinical tumor samples, the precision and sensitivity of four SCNA callers were measured using 50 simulated whole exome and 50 simulated targeted gene panel datasets, and using 119 TCGA tumor samples for which SNP array data were available.

Results: On synthetic exome and panel data, VarScan2 mostly called false positives, whereas Control-FREEC was precise (>90% correct calls) at the cost of low sensitivity (<40% detected). ONCOCNV was slightly less precise on gene panel data, with similarly low sensitivity. This could be explained by low sensitivity for amplifications and high precision for deletions. Surprisingly, these results were not strongly affected by moderate tumor impurities; only contaminations with more than 60% non-cancerous cells resulted in strongly declining precision and sensitivity. On the 119 clinical samples, both Control-FREEC and CNVkit called 71.8% and 94%, respectively, of the SCNAs found by the SNP arrays, but with a considerable amount of false positives (precision 29% and 4.9%).

Discussion: Whole exome and targeted gene panel methods by design limit the precision of SCNA callers, making them prone to false positives. SCNA calls cannot easily be integrated in clinical pipelines that use data from targeted capture-based sequencing. If used at all, they need to be cross-validated using orthogonal methods.

Availability and implementation: Scripts are provided as supplementary information.

Contact: gunther.jansen@molecularhealth.com.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms*
  • DNA Copy Number Variations*
  • DNA, Neoplasm
  • Exome Sequencing / methods*
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Neoplasms / diagnosis
  • Neoplasms / genetics
  • Reproducibility of Results

Substances

  • DNA, Neoplasm