SLMSuite: a suite of algorithms for segmenting genomic profiles

BMC Bioinformatics. 2017 Jun 28;18(1):321. doi: 10.1186/s12859-017-1734-5.

Abstract

Background: The identification of copy number variants (CNVs) is essential to study human genetic variation and to understand the genetic basis of mendelian disorders and cancers. At present, genome-wide detection of CNVs can be achieved using microarray or second generation sequencing (SGS) data. Although these technologies are very different, the genomic profiles that they generate are mathematically very similar and consist of noisy signals in which a decrease or increase of consecutive data represent deletions or duplication of DNA. In this framework, the most important step of the analysis consists of segmenting genomic profiles for the identification of the boundaries of genomic regions with increased or decreased signal.

Results: Here we introduce SLMSuite, a collection of algorithms, based on shifting level models (SLM), to segment genomic profiles from array and SGS experiments. The SLM algorithms take as input the log-transformed genomic profiles from SGS or microarray experiments and output segmentation results. We apply our method to the analysis of synthetic genomic profiles and real whole genome sequencing data and we demonstrate that it outperforms the state of the art circular binary segmentation algorithm in terms of sensitivity, specificity and computational speed.

Conclusion: The SLMSuite contains an R library with the segmentation methods and three wrappers that allow to use them in Python, Ruby and C++. SLMSuite is freely available at https://sourceforge.net/projects/slmsuite .

Keywords: Bioinformatics; Genomics; SLM; Software.

MeSH terms

  • Algorithms*
  • DNA / chemistry
  • DNA / genetics
  • DNA Copy Number Variations
  • Genome, Human
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Internet
  • Sequence Analysis, DNA
  • User-Computer Interface*

Substances

  • DNA