ifCNV: A novel isolation-forest-based package to detect copy-number variations from various targeted NGS datasets

Mol Ther Nucleic Acids. 2022 Sep 22:30:174-183. doi: 10.1016/j.omtn.2022.09.009. eCollection 2022 Dec 13.

Abstract

Copy-number variations (CNVs) are an essential component of genetic variation distributed across large parts of the human genome. CNV detection from next-generation sequencing data and artificial intelligence algorithms have progressed in recent years. However, only a few tools have taken advantage of machine-learning algorithms for CNV detection, and none propose using artificial intelligence to automatically detect probable CNV-positive samples. The most developed approach is to use a reference or normal dataset to compare with the samples of interest, and it is well known that selecting appropriate normal samples represents a challenging task that dramatically influences the precision of results in all CNV-detecting tools. With careful consideration of these issues, we propose here ifCNV, a new software based on isolation forests that creates its own reference, available in R and python with customizable parameters. ifCNV combines artificial intelligence using two isolation forests and a comprehensive scoring method to faithfully detect CNVs among various samples. It was validated using targeted next-generation sequencing (NGS) datasets from diverse origins (capture and amplicon, germline and somatic), and it exhibits high sensitivity, specificity, and accuracy. ifCNV is a publicly available open-source software (https://github.com/SimCab-CHU/ifCNV) that allows the detection of CNVs in many clinical situations.

Keywords: CNV detection; MT: Bioinformatics; Python open-source package; R open-source package; artificial intelligence; localization scoring; machine learning.