A comparison of algorithms for identifying copy number variants in family-based whole-exome sequencing data and its implications in inheritance pattern analysis

Gene. 2023 Apr 20:861:147237. doi: 10.1016/j.gene.2023.147237. Epub 2023 Jan 30.

Abstract

There remain challenges in accurately identifying constitutional or germline copy number variants (gCNVs) based on whole-exome sequencing data that have implications for genetic diagnosis for 'rare undiagnosed disease' in the clinical setting. Although multiple algorithms have been proposed, a systematic comparison of these algorithms for calling gCNVs and analyzing inherited pattern have yet to be fully conducted. Therefore, we empirically compared seven exome-based algorithms, including XHMM, CLAMMS, CODEX2, ExomeDepth, DECoN, CN.MOPS, and GATK gCNV, for calling gCNVs in 151 individuals from 44 pedigrees, together with the gold standard of genotyping-derived gCNVs in the same cohort for the performance assessment. These algorithms demonstrated varied powers in identifying gCNVs, although the distribution of gCNVs size was similar. The number of shared gCNVs across these algorithms was limited (e.g., only four gCNVs shared among seven algorithms); however, several algorithms showed varying degrees of consistency (e.g., 1,843 gCNVs shared between DECoN and ExomeDepth). CLAMMS and CODEX2 outperformed the remaining algorithms according to a relatively higher F-score (i.e., 0.145 and 0.152, respectively). In addition, these algorithms exhibited different Mendelian inconsistencies of gCNVs and significant challenges remained in inheritance pattern analysis. In conclusion, selecting good algorithms may have important implications in gCNVs-based inheritance pattern analysis for family-based studies.

Keywords: Algorithms; Germline copy number variants (gCNVs); Performance; Whole-exome sequencing.

MeSH terms

  • Algorithms*
  • DNA Copy Number Variations*
  • Exome
  • Exome Sequencing
  • High-Throughput Nucleotide Sequencing
  • Humans