COPS: a sensitive and accurate tool for detecting somatic Copy Number Alterations using short-read sequence data from paired samples

PLoS One. 2012;7(10):e47812. doi: 10.1371/journal.pone.0047812. Epub 2012 Oct 22.

Abstract

Copy Number Alterations (CNAs) such as deletions and duplications; compose a larger percentage of genetic variations than single nucleotide polymorphisms or other structural variations in cancer genomes that undergo major chromosomal re-arrangements. It is, therefore, imperative to identify cancer-specific somatic copy number alterations (SCNAs), with respect to matched normal tissue, in order to understand their association with the disease. We have devised an accurate, sensitive, and easy-to-use tool, COPS, COpy number using Paired Samples, for detecting SCNAs. We rigorously tested the performance of COPS using short sequence simulated reads at various sizes and coverage of SCNAs, read depths, read lengths and also with real tumor:normal paired samples. We found COPS to perform better in comparison to other known SCNA detection tools for all evaluated parameters, namely, sensitivity (detection of true positives), specificity (detection of false positives) and size accuracy. COPS performed well for sequencing reads of all lengths when used with most upstream read alignment tools. Additionally, by incorporating a downstream boundary segmentation detection tool, the accuracy of SCNA boundaries was further improved. Here, we report an accurate, sensitive and easy to use tool in detecting cancer-specific SCNAs using short-read sequence data. In addition to cancer, COPS can be used for any disease as long as sequence reads from both disease and normal samples from the same individual are available. An added boundary segmentation detection module makes COPS detected SCNA boundaries more specific for the samples studied. COPS is available at ftp://115.119.160.213 with username "cops" and password "cops".

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence / genetics*
  • Computer Simulation
  • DNA Copy Number Variations / genetics*
  • Genetic Techniques*
  • Humans
  • Neoplasms / genetics*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Software*

Grants and funding

Research is funded by Department of Information Technology, Government of India (Ref No: 18(4)/2010-E-Infra., 31-03-2010) and Department of IT, BT and ST, Government of Karnataka, India (Ref No: 3451-00-090-2-22). Genome sequencing data used in this study was generated by funds jointly provided by Strand Life Sciences and Narayana Hrudayalaya. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.