A Local Outlier Factor-Based Detection of Copy Number Variations From NGS Data

IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1811-1820. doi: 10.1109/TCBB.2019.2961886. Epub 2021 Oct 7.

Abstract

Copy number variation (CNV) is a major type of genomic structural variations that play an important role in human disorders. Next generation sequencing (NGS) has fueled the advancement in algorithm design to detect CNVs at base-pair resolution. However, accurate detection of CNVs of low amplitudes remains a challenging task. This paper proposes a new computational method, CNV-LOF, to identify CNVs of full-range amplitudes from NGS data. CNV-LOF is distinctly different from traditional methods, which mainly consider aberrations from a global perspective and rely on some assumed distribution of NGS read depths. In contrast, CNV-LOF takes a local view on the read depths and assigns an outlier factor to each genome segment. With the outlier factor profile, CNV-LOF uses a boxplot procedure to declare CNVs without the reliance of any distribution assumptions. Simulation experiments indicate that CNV-LOF outperforms five existing methods with respect to F1-measure, sensitivity, and precision. CNV-LOF is further validated on real sequencing samples, yielding highly consistent results with peer methods. CNV-LOF is able to detect CNVs of low and moderate amplitudes where the other existing methods fail, and it is expected to become a routine approach for the discovery of novel CNVs on whole sequencing genome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • DNA Copy Number Variations / genetics*
  • Genome, Human / genetics
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Sequence Analysis, DNA / methods*