A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data

Genome Res. 2011 Oct;21(10):1728-37. doi: 10.1101/gr.119784.110. Epub 2011 Aug 26.

Abstract

Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Alleles*
  • Alternative Splicing
  • Bayes Theorem
  • Gene Expression*
  • Humans
  • Markov Chains
  • Models, Genetic*
  • Monte Carlo Method
  • Polymorphism, Single Nucleotide
  • ROC Curve
  • Saccharomyces cerevisiae / genetics
  • Saccharomyces cerevisiae Proteins / genetics
  • Sequence Analysis, RNA*
  • Transcription, Genetic

Substances

  • Saccharomyces cerevisiae Proteins