DeepSNP: An End-to-End Deep Neural Network with Attention-Based Localization for Breakpoint Detection in Single-Nucleotide Polymorphism Array Genomic Data

J Comput Biol. 2019 Jun;26(6):572-596. doi: 10.1089/cmb.2018.0172. Epub 2018 Dec 26.

Abstract

Clinical decision-making in cancer and other diseases relies on timely and cost-effective genome-wide testing. Classical bioinformatic algorithms, such as Rawcopy, can support genomic analysis by calling genomic breakpoints and copy-number variations (CNVs), but often require manual data curation, which is error prone, time-consuming, and thus substantially increasing costs of genomic testing and hampering timely delivery of test results to the treating physician. We aimed to investigate whether deep learning algorithms can be used to learn from genome-wide single-nucleotide polymorphism array (SNPa) data and improve state-of-the-art algorithms. We developed, applied, and validated a novel deep neural network (DNN), DeepSNP. A manually curated data set of 50 SNPa analyses was used as truth-set. We show that DeepSNP can learn from SNPa data and classify the presence or absence of genomic breakpoints within large genomic windows with high precision and recall. DeepSNP was compared with well-known neural network models as well as with Rawcopy. Moreover, the use of a localization unit indicates the ability to pinpoint genomic breakpoints despite their exact location not being provided while training. DeepSNP results demonstrate the potential of DNN architectures to learn from genomic SNPa data and encourage further adaptation for CNV detection in SNPa and other genomic data types.

Keywords: SNPa; breakpoint detection; deep neural networks; weak label.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Comparative Genomic Hybridization / methods
  • Computational Biology / methods
  • DNA Copy Number Variations / genetics
  • Deep Learning
  • Genome, Human / genetics
  • Genomics / methods*
  • Humans
  • Neural Networks, Computer
  • Oligonucleotide Array Sequence Analysis / methods
  • Polymorphism, Single Nucleotide / genetics*