Transcriptome software results show significant variation among different commercial pipelines

BMC Genomics. 2023 Nov 3;24(1):662. doi: 10.1186/s12864-023-09683-w.

Abstract

Background: We have been documenting the biological responses to low levels of radiation (natural background) and very low level radiation (below background), and thus these studies are testing mild external stimuli to which we would expect relatively mild biological responses. We recently published a transcriptome software comparison study based on RNA-Seqs from a below background radiation treatment of two model organisms, E. coli and C. elegans (Thawng and Smith, BMC Genomics 23:452, 2022). We reported DNAstar-D (Deseq2 in the DNAstar software pipeline) to be the more conservative, realistic tool for differential gene expression compared to other transcriptome software packages (CLC, Partek and DNAstar-E (using edgeR). Here we report two follow-up studies (one with a new model organism, Aedes aegypti and another software package (Azenta) on transcriptome responses from varying dose rates using three different sources of natural radiation.

Results: When E. coli was exposed to varying levels of K40, we again found that the DNAstar-D pipeline yielded a more conservative number of DEGs and a lower fold-difference than the CLC pipeline and DNAstar-E run in parallel. After a 30 read minimum cutoff criterion was applied to the data, the number of significant DEGs ranged from 0 to 81 with DNAstar-D, while the number of significant DEGs ranged from 4 to 117 and 14 to 139 using DNAstar-E and the CLC pipelines, respectively. In terms of the extent of expression, the highest foldchange DEG was observed in DNAstar-E with 19.7-fold followed by 12.5-fold in CLC and 4.3-fold in DNAstar-D. In a recently completed study with Ae. Aegypti and using another software package (Azenta), we analyzed the RNA-Seq response to similar sources of low-level radiation and again found the DNAstar-D pipeline to give the more conservative number and fold-expression of DEGs compared to other softwares. The number of significant DEGs ranged 31-221 in Azenta and 31 to 237 in CLC, 19-252 in DNAstar-E and 0-67 in DNAStar-D. The highest fold-change of DEGs were found in CLC (1,350.9-fold), with DNAstar-E (5.9 -fold) and Azenta (5.5-fold) intermediate, and the lowest levels of expression (4-fold) found in DNAstar-D.

Conclusions: This study once again highlights the importance of choosing appropriate software for transcriptome analysis. Using three different biological models (bacteria, nematode and mosquito) in four different studies testing very low levels of radiation (Van Voorhies et al., Front Public Health 8:581796, 2020; Thawng and Smith, BMC Genomics 23:452, 2022; current study), the CLC software package resulted in what appears to be an exaggerated gene expression response in terms of numbers of DEGs and extent of expression. Setting a 30-read cutoff diminishes this exaggerated response in most of the software tested. We have further affirmed that DNAstar-Deseq2 gives a more conservative transcriptome expression pattern which appears more suitable for studies expecting subtle gene expression patterns.

Keywords: Fold-changes; Low radiation; Model organisms; Pipeline; RNA-Seq; Transcriptome software.

MeSH terms

  • Aedes*
  • Animals
  • Caenorhabditis elegans / genetics
  • Escherichia coli / genetics
  • Gene Expression Profiling / methods
  • Sequence Analysis, RNA / methods
  • Software
  • Transcriptome*