Polishing copy number variant calls on exome sequencing data via deep learning

Furkan Özden; Can Alkan; A Ercüment Çiçek

doi:10.1101/gr.274845.120

Polishing copy number variant calls on exome sequencing data via deep learning

Genome Res. 2022 Jun;32(6):1170-1182. doi: 10.1101/gr.274845.120. Epub 2022 Jun 13.

Authors

Furkan Özden¹, Can Alkan¹, A Ercüment Çiçek^{1

2}

Affiliations

¹ Department of Computer Engineering, Bilkent University, 06800 Ankara, Turkey.
² Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.

Abstract

Accurate and efficient detection of copy number variants (CNVs) is of critical importance owing to their significant association with complex genetic diseases. Although algorithms that use whole-genome sequencing (WGS) data provide stable results with mostly valid statistical assumptions, copy number detection on whole-exome sequencing (WES) data shows comparatively lower accuracy. This is unfortunate as WES data are cost-efficient, compact, and relatively ubiquitous. The bottleneck is primarily due to the noncontiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, DECoNT, which uses the matched WES and WGS data, and learns to correct the copy number variations reported by any off-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that we can efficiently triple the duplication call precision and double the deletion call precision of the state-of-the-art algorithms. We also show that our model consistently improves the performance independent of (1) sequencing technology, (2) exome capture kit, and (3) CNV caller. Using DECoNT as a universal exome CNV call polisher has the potential to improve the reliability of germline CNV detection on WES data sets.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
DNA Copy Number Variations
Deep Learning*
Exome Sequencing
Exome*
High-Throughput Nucleotide Sequencing / methods
Reproducibility of Results