Replicate sequencing libraries are important for quantification of allelic imbalance

Nat Commun. 2021 Jun 7;12(1):3370. doi: 10.1038/s41467-021-23544-8.

Abstract

A sensitive approach to quantitative analysis of transcriptional regulation in diploid organisms is analysis of allelic imbalance (AI) in RNA sequencing (RNA-seq) data. A near-universal practice in such studies is to prepare and sequence only one library per RNA sample. We present theoretical and experimental evidence that data from a single RNA-seq library is insufficient for reliable quantification of the contribution of technical noise to the observed AI signal; consequently, reliance on one-replicate experimental design can lead to unaccounted-for variation in error rates in allele-specific analysis. We develop a computational approach, Qllelic, that accurately accounts for technical noise by making use of replicate RNA-seq libraries. Testing on new and existing datasets shows that application of Qllelic greatly decreases false positive rate in allele-specific analysis while conserving appropriate signal, and thus greatly improves reproducibility of AI estimates. We explore sources of technical overdispersion in observed AI signal and conclude by discussing design of RNA-seq studies addressing two biologically important questions: quantification of transcriptome-wide AI in one sample, and differential analysis of allele-specific expression between samples.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Alleles
  • Allelic Imbalance*
  • Animals
  • Female
  • Gene Library*
  • Mice
  • Mice, 129 Strain
  • Models, Genetic
  • Polymorphism, Single Nucleotide*
  • RNA / genetics*
  • RNA / metabolism
  • Sequence Analysis, RNA / methods*
  • Transcriptome / genetics*

Substances

  • RNA