Calling amplified haplotypes in next generation tumor sequence data

Genome Res. 2012 Feb;22(2):362-74. doi: 10.1101/gr.122564.111. Epub 2011 Nov 16.

Abstract

During tumor initiation and progression, cancer cells acquire a selective advantage, allowing them to outcompete their normal counterparts. Identification of the genetic changes that underlie these tumor acquired traits can provide deeper insights into the biology of tumorigenesis. Regions of copy number alterations and germline DNA variants are some of the elements subject to selection during tumor evolution. Integrated examination of inherited variation and somatic alterations holds the potential to reveal specific nucleotide alleles that a tumor "prefers" to have amplified. Next-generation sequencing of tumor and matched normal tissues provides a high-resolution platform to identify and analyze such somatic amplicons. Within an amplicon, examination of informative (e.g., heterozygous) sites deviating from a 1:1 ratio may suggest selection of that allele. A naive approach examines the reads for each heterozygous site in isolation; however, this ignores available valuable linkage information across sites. We, therefore, present a novel hidden Markov model-based method-Haplotype Amplification in Tumor Sequences (HATS)-that analyzes tumor and normal sequence data, along with training data for phasing purposes, to infer amplified alleles and haplotypes in regions of copy number gain. Our method is designed to handle rare variants and biases in read data. We assess the performance of HATS using simulated amplified regions generated from varying copy number and coverage levels, followed by amplicons in real data. We demonstrate that HATS infers the amplified alleles more accurately than does the naive approach, especially at low to intermediate coverage levels and in cases (including high coverage) possessing stromal contamination or allelic bias.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computational Biology / methods*
  • Computer Simulation
  • Gene Amplification*
  • Haplotypes*
  • Humans
  • Models, Genetic
  • Neoplasms / genetics*
  • Reproducibility of Results
  • Sensitivity and Specificity