Fully Bayesian analysis of allele-specific RNA-seq data

Math Biosci Eng. 2019 Aug 23;16(6):7751-7770. doi: 10.3934/mbe.2019389.

Abstract

Diploid organisms have two copies of each gene, called alleles, that can be separately transcribed. The RNA abundance associated to any particular allele is known as allele-specific expression (ASE). When two alleles have polymorphisms in transcribed regions, ASE can be studied using RNA-seq read count data. ASE has characteristics different from the regular RNA-seq expression: ASE cannot be assessed for every gene, measures of ASE can be biased towards one of the alleles (reference allele), and ASE provides two measures of expression for a single gene for each biological samples with leads to additional complications for single-gene models. We present statistical methods for modeling ASE and detecting genes with differential allelic expression. We propose a hierarchical, overdispersed, count regression model to deal with ASE counts. The model accommodates gene-specific overdispersion, has an internal measure of the reference allele bias, and uses random effects to model the gene-specific regression parameters. Fully Bayesian inference is obtained using the fbseq package that implements a parallel strategy to make the computational times reasonable. Simulation and real data analysis suggest the proposed model is a practical and powerful tool for the study of differential ASE.

Keywords: allele-specific expression; GPU; Markov chain Monte Carlo; RNA-seq; hierarchical model; shrinkage priors.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Alleles
  • Bayes Theorem*
  • Computer Graphics
  • Computer Simulation
  • Gene Library
  • Heterozygote
  • Markov Chains
  • Models, Statistical
  • Monte Carlo Method
  • RNA, Plant / genetics
  • RNA-Seq*
  • ROC Curve
  • Regression Analysis
  • Software
  • Zea mays / genetics*
  • Zea mays / physiology

Substances

  • RNA, Plant