Transcriptator: An Automated Computational Pipeline to Annotate Assembled Reads and Identify Non Coding RNA

PLoS One. 2015 Nov 18;10(11):e0140268. doi: 10.1371/journal.pone.0140268. eCollection 2015.

Abstract

RNA-seq is a new tool to measure RNA transcript counts, using high-throughput sequencing at an extraordinary accuracy. It provides quantitative means to explore the transcriptome of an organism of interest. However, interpreting this extremely large data into biological knowledge is a problem, and biologist-friendly tools are lacking. In our lab, we developed Transcriptator, a web application based on a computational Python pipeline with a user-friendly Java interface. This pipeline uses the web services available for BLAST (Basis Local Search Alignment Tool), QuickGO and DAVID (Database for Annotation, Visualization and Integrated Discovery) tools. It offers a report on statistical analysis of functional and Gene Ontology (GO) annotation's enrichment. It helps users to identify enriched biological themes, particularly GO terms, pathways, domains, gene/proteins features and protein-protein interactions related informations. It clusters the transcripts based on functional annotations and generates a tabular report for functional and gene ontology annotations for each submitted transcript to the web server. The implementation of QuickGo web-services in our pipeline enable the users to carry out GO-Slim analysis, whereas the integration of PORTRAIT (Prediction of transcriptomic non coding RNA (ncRNA) by ab initio methods) helps to identify the non coding RNAs and their regulatory role in transcriptome. In summary, Transcriptator is a useful software for both NGS and array data. It helps the users to characterize the de-novo assembled reads, obtained from NGS experiments for non-referenced organisms, while it also performs the functional enrichment analysis of differentially expressed transcripts/genes for both RNA-seq and micro-array experiments. It generates easy to read tables and interactive charts for better understanding of the data. The pipeline is modular in nature, and provides an opportunity to add new plugins in the future. Web application is freely available at: http://www-labgtp.na.icar.cnr.it/Transcriptator.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Gene Ontology
  • Gene Regulatory Networks
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Internet
  • Molecular Sequence Annotation*
  • RNA, Untranslated / chemistry
  • RNA, Untranslated / genetics*
  • Sequence Analysis, RNA
  • Transcriptome*
  • User-Computer Interface*

Substances

  • RNA, Untranslated

Grants and funding

Mario R Guarracino received funding from INTEROMICS flagship project PON02-00612-3461281 and PON02-00619-3470457 and National Research University Higher School of Economics RSF grant 365 14-41-00039.