DL-CNV: A deep learning method for identifying copy number variations based on next generation target sequencing

Math Biosci Eng. 2019 Sep 30;17(1):202-215. doi: 10.3934/mbe.2020011.

Abstract

Copy number variations (CNVs) play an important role in many types of cancer. With the rapid development of next generation sequencing (NGS) techniques, many methods for detecting CNVs of a single sample have emerged: (i) require genome-wide data of both case and control samples, (ii) depend on sequencing depth and GC content correction algorithm, (iii) rely on statistical models built on CNV positive and negative sample datasets. These make them costly in the data analysis and ineffective in the targeted sequencing data. In this study, we developed a novel alignment-free method called DL-CNV to call CNV from the target sequencing data of a single sample. Specifically, we collected two sets of samples. The first set consists of 1301 samples, in which 272 have CNVs in ERBB2 and the second set is composed of 1148 samples with 63 samples containing CNVs in MET. Finally, we found that a testing AUC of 0.9454 for ERBB2 and 0.9220 for MET. Furthermore, we hope to make the CNV detection could be more accurate with clinical "gold standard" (e.g. FISH) information and provide a new research direction, which can be used as the supplement to the existing NGS methods.

Keywords: convolutional neural network; copy number variation; deep learning; next generation sequencing; target sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Area Under Curve
  • Carcinoma, Non-Small-Cell Lung / genetics*
  • DNA Copy Number Variations*
  • Databases, Factual
  • Deep Learning*
  • Exons
  • False Positive Reactions
  • Genome, Human
  • Genome-Wide Association Study
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • In Situ Hybridization, Fluorescence
  • Lung Neoplasms / genetics*
  • Proto-Oncogene Proteins c-met / genetics
  • ROC Curve
  • Receptor, ErbB-2 / genetics
  • Reproducibility of Results
  • Sensitivity and Specificity

Substances

  • ERBB2 protein, human
  • MET protein, human
  • Proto-Oncogene Proteins c-met
  • Receptor, ErbB-2