LDSSNV: A Linkage Disequilibrium-Based Method for the Detection of Somatic Single-Nucleotide Variants

IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):3020-3032. doi: 10.1109/TCBB.2023.3291134. Epub 2023 Oct 9.

Abstract

Single nucleotide variants (SNVs) are very common in human genome and pose a significant effect on cellular proliferation and tumorigenesis in various cancers. Somatic variant and germline variant are the two forms of SNVs. They are the major drivers of inherited diseases and acquired tumors respectively. A reasonable analysis of the next generation sequencing data profiles from cancer genomes could provide crucial information for cancer diagnosis and treatment. Accurate detection of SNVs and distinguishing the two forms are still considered challenging tasks in cancer analysis. Herein, we propose a new approach, LDSSNV, to detect somatic SNVs without matched normal samples. LDSSNV predicts SNVs by training the XGboost classifier on a concise combination of features and distinguishes the two forms based on linkage disequilibrium which is a trait between germline mutations. LDSSNV provides two modes to distinguish the somatic variants from germline variants, the single-mode and multiple-mode by respectively using a single tumor sample and multiple tumor samples. The performance of the proposed method is assessed on both simulation data and real sequencing datasets. The analysis shows that the LDSSNV method outperforms competing methods and can become a robust and reliable tool for analyzing tumor genome variation.