Widespread Contamination of Arabidopsis Embryo and Endosperm Transcriptome Data Sets

Plant Cell. 2017 Apr;29(4):608-617. doi: 10.1105/tpc.16.00845. Epub 2017 Mar 17.

Abstract

A major goal of global gene expression profiling in plant seeds has been to investigate the parental contributions to the transcriptomes of early embryos and endosperm. However, consistency between independent studies has been poor, leading to considerable debate. We have developed a statistical tool that reveals the presence of substantial RNA contamination from maternal tissues in nearly all published Arabidopsis thaliana endosperm and early embryo transcriptomes generated in these studies. We demonstrate that maternal RNA contamination explains the poor reproducibility of these transcriptomic data sets. Furthermore, we found that RNA contamination from maternal tissues has been repeatedly misinterpreted as epigenetic phenomena, which has resulted in inaccurate conclusions regarding the parental contributions to both the endosperm and early embryo transcriptomes. After accounting for maternal RNA contamination, no published genome-wide data set supports the concept of delayed paternal genome activation in plant embryos. Moreover, our analysis suggests that maternal and paternal genomic imprinting are equally rare events in Arabidopsis endosperm. Our publicly available software (https://github.com/Gregor-Mendel-Institute/tissue-enrichment-test) can help the community assess the level of contamination in transcriptome data sets generated from both seed and non-seed tissues.

MeSH terms

  • Arabidopsis / genetics
  • Arabidopsis / metabolism*
  • Arabidopsis Proteins / genetics
  • Arabidopsis Proteins / metabolism*
  • Endosperm / genetics
  • Endosperm / metabolism
  • Gene Expression Regulation, Plant / genetics
  • Seeds / genetics
  • Seeds / metabolism
  • Software
  • Transcriptome / genetics

Substances

  • Arabidopsis Proteins