Improving Copy Number Variant Detection from Sequencing Data with a Combination of Programs and a Predictive Model

J Mol Diagn. 2020 Jan;22(1):40-49. doi: 10.1016/j.jmoldx.2019.08.009. Epub 2019 Nov 13.

Abstract

Bioinformatics tools for analyzing copy number variants (CNVs) from massively parallel sequencing (MPS) data are less well developed compared with other variant types. We present an efficient bioinformatics pipeline for CNV detection from gene panel MPS data in neuromuscular disorders. CNVs were generated in silico into samples sequenced with a previously published MPS gene panel. The in silico CNVs from these samples were analyzed with four programs having complementary CNV detection ranges: CoNIFER, XHMM, ExomeDepth, and CODEX. A logistic regression model was trained with the obtained set of in silico CNV detections to predict true-positive CNV detections among all CNV detections from samples. This model was validated using 66 control samples with a verified true-positive (n = 58) or false-positive (n = 8) CNV detection. Applying all four programs together provided more sensitive detection results with in silico CNVs than other program combinations or any program alone. Furthermore, a model with CNV detection-specific scores from all four programs as variables performed overall best in the validation. No single program could detect all CNV sizes and types equally or with enough accuracy. Therefore, a combination of carefully selected programs should be used to maximize detection accuracy. In addition, the detected CNVs should be reviewed with a statistical model to streamline and standardize the filtering of the detections for annotation.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Cohort Studies
  • Computational Biology / methods*
  • Computer Simulation
  • DNA Copy Number Variations / genetics*
  • Exome
  • Exons
  • Female
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Logistic Models
  • Male
  • Models, Statistical*
  • Mosaicism
  • Neuromuscular Diseases / genetics*
  • Polymorphism, Single Nucleotide
  • Sensitivity and Specificity
  • Sequence Analysis, DNA