Extension of Partial Gene Transcripts by Iterative Mapping of RNA-Seq Raw Reads

IEEE/ACM Trans Comput Biol Bioinform. 2018 Aug 13. doi: 10.1109/TCBB.2018.2865309. Online ahead of print.

Abstract

Many non-model organisms lack reference genomes and the sequencing and de novo assembly of an organism's transcriptome is an affordable means by which to characterize the coding component of its genome. Despite the advances that have made this possible, assembling a transcriptome without a known reference usually results in a collection of full-length and partial gene transcripts. The downstream analysis of genes represented as partial transcripts then often requires further experimental work in the laboratory in order to obtain full- length sequences. We have explored whether partial transcripts, encoding genes of interest present in de novo assembled transcriptomes of a model and non-model insect species, could be further extended by iterative mapping against the raw transcriptome sequencing reads. Partial sequences encoding cytochrome P450s and carboxyl/cholinesterase were used in this analysis because they are large multigene families and exhibit significant variation in expression. We present an effective method to improve the continuity of partial transcripts in silico that, in the absence of a reference genome, maybe a quick and cost-effective alternative to their extension by laboratory experimentation. Our approach resulted in the successful extension of incompletely assembled transcripts, often to full length. We experimentally validated these results \textit{in silico} and using real-time PCR and sequencing.