STIC: Predicting Single Nucleotide Variants and Tumor Purity in Cancer Genome

IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2692-2701. doi: 10.1109/TCBB.2020.2975181. Epub 2021 Dec 8.

Abstract

Single nucleotide variant (SNV) plays an important role in cellular proliferation and tumorigenesis in various types of human cancer. Next-generation sequencing (NGS) has provided high-throughput data at an unprecedented resolution to predict SNVs. Currently, there exist many computational methods for either germline or somatic SNV discovery from NGS data, but very few of them are versatile enough to adapt to any situations. In the absence of matched normal samples, the prediction of somatic SNVs from single-tumor samples becomes considerably challenging, especially when the tumor purity is unknown. Here, we propose a new approach, STIC, to predict somatic SNVs and estimate tumor purity from NGS data without matched normal samples. The main features of STIC include: (1) extracting a set of SNV-relevant features on each site and training the BP neural network algorithm on the features to predict SNVs; (2) creating an iterative process to distinguish somatic SNVs from germline ones by disturbing allele frequency; and (3) establishing a reasonable relationship between tumor purity and allele frequencies of somatic SNVs to accurately estimate the purity. We quantitatively evaluate the performance of STIC on both simulation and real sequencing datasets, the results of which indicate that STIC outperforms competing methods.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Genome, Human / genetics*
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Neoplasms / genetics*
  • Polymorphism, Single Nucleotide / genetics*
  • Sequence Analysis, DNA