A genomic random interval model for statistical analysis of genomic lesion data

Bioinformatics. 2013 Sep 1;29(17):2088-95. doi: 10.1093/bioinformatics/btt372. Epub 2013 Jul 10.

Abstract

Motivation: Tumors exhibit numerous genomic lesions such as copy number variations, structural variations and sequence variations. It is difficult to determine whether a specific constellation of lesions observed across a cohort of multiple tumors provides statistically significant evidence that the lesions target a set of genes that may be located across different chromosomes but yet are all involved in a single specific biological process or function.

Results: We introduce the genomic random interval (GRIN) statistical model and analysis method that evaluates the statistical significance of the abundance of genomic lesions that overlap a specific locus or a pre-defined set of biologically related loci. The GRIN model retains certain biologically important properties of genomic lesions that are ignored by other methods. In a simulation study and two example analyses of leukemia genomic lesion data, GRIN more effectively identified important loci as significant than did three methods based on a permutation-of-markers model. GRIN also identified biologically relevant pathways with a significant abundance of lesions in both examples.

Availability: An R package will be freely available at CRAN and www.stjuderesearch.org/site/depts/biostats/software.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA Copy Number Variations
  • Genetic Loci
  • Genetic Variation*
  • Genomics / methods
  • Humans
  • Models, Statistical*
  • Neoplasms / genetics*
  • Precursor Cell Lymphoblastic Leukemia-Lymphoma / genetics
  • Precursor T-Cell Lymphoblastic Leukemia-Lymphoma / genetics