TEcandidates: prediction of genomic origin of expressed transposable elements using RNA-seq data

Bioinformatics. 2018 Nov 15;34(22):3915-3916. doi: 10.1093/bioinformatics/bty423.

Abstract

Motivation: In recent years, Transposable Elements (TEs) have been related to gene regulation. However, estimating the origin of expression of TEs through RNA-seq is complicated by multi-mapping reads coming from their repetitive sequences. Current approaches that address multi-mapping reads are focused in expression quantification and not in finding the origin of expression. Addressing the genomic origin of expressed TEs could further aid in understanding the role that TEs might have in the cell.

Results: We have developed a new pipeline called TEcandidates, based on de novo transcriptome assembly to assess the instances of TEs being expressed, along with their location, to include in downstream DE analysis. TEcandidates takes as input the RNA-seq data, the genome sequence and the TE annotation file and returns a list of coordinates of candidate TEs being expressed, the TEs that have been removed and the genome sequence with removed TEs as masked. This masked genome is suited to include TEs in downstream expression analysis, as the ambiguity of reads coming from TEs is significantly reduced in the mapping step of the analysis.

Availability and implementation: The script which runs the pipeline can be downloaded at http://www.mobilomics.org/tecandidates/downloads or http://github.com/TEcandidates/TEcandidates.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA Transposable Elements*
  • Gene Expression Regulation
  • Genomics
  • RNA
  • Transcriptome

Substances

  • DNA Transposable Elements
  • RNA