Genome-Wide Constitutively Expressed Gene Analysis and New Reference Gene Selection Based on Transcriptome Data: A Case Study from Poplar/Canker Disease Interaction

Front Plant Sci. 2017 Oct 31:8:1876. doi: 10.3389/fpls.2017.01876. eCollection 2017.

Abstract

A number of transcriptome datasets for differential expression (DE) genes have been widely used for understanding organismal biology, but these datasets also contain untapped information that can be used to develop more precise analytical tools. With the use of transcriptome data generated from poplar/canker disease interaction system, we describe a methodology to identify candidate reference genes from high-throughput sequencing data. This methodology will improve the accuracy of RT-qPCR and will lead to better standards for the normalization of expression data. Expression stability analysis from xylem and phloem of Populus bejingensis inoculated with the fungal canker pathogen Botryosphaeria dothidea revealed that 729 poplar transcripts (1.11%) were stably expressed, at a threshold level of coefficient of variance (CV) of FPKM < 20% and maximum fold change (MFC) of FPKM < 2.0. Expression stability and bioinformatics analysis suggested that commonly used house-keeping (HK) genes were not the most appropriate internal controls: 70 of the 72 commonly used HK genes were not stably expressed, 45 of the 72 produced multiple isoform transcripts, and some of their reported primers produced unspecific amplicons in PCR amplification. RT-qPCR analysis to compare and evaluate the expression stability of 10 commonly used poplar HK genes and 20 of the 729 newly-identified stably expressed transcripts showed that some of the newly-identified genes (such as SSU_S8e, LSU_L5e, and 20S_PSU) had higher stability ranking than most of commonly used HK genes. Based on these results, we recommend a pipeline for deriving reference genes from transcriptome data. An appropriate candidate gene should have a unique transcript, constitutive expression, CV value of expression < 20% (or possibly 30%) and MFC value of expression <2, and an expression level of 50-1,000 units. Lastly, when four of the newly identified HK genes were used in the normalization of expression data for 20 differential expressed genes, expression analysis gave similar values to Cufflinks output. The methods described here provide an alternative pathway for the normalization of transcriptome data, a process that is essential for integrating analyses of transcriptome data across environments, laboratories, sequencing platforms, and species.

Keywords: Botryosphaeria dothidea; differential expression; expression stability; high-throughput sequencing; house-keeping gene; integrate analysis; internal control; poplar.