IsoSplitter: identification and characterization of alternative splicing sites without a reference genome

RNA. 2021 May 21;27(8):868-875. doi: 10.1261/rna.077834.120. Online ahead of print.

Abstract

Long-read transcriptome sequencing is designed to sequence full-length RNA molecules and advantageous for identifying alternative splice isoforms; however, in the absence of a reference genome, it is difficult to accurately locate splice sites, because of the diversity of patterns of alternative splicing (AS). Based on long-read transcriptome data we developed a versatile tool, IsoSplitter, to reverse-trace and validate AS gene "split-sites" with the following features: (1) IsoSplitter initially invokes a modified SIM4 program to find transcript split-sites; (2) each split-site is then quantified, to reveal transcript diversity, and putative isoforms are grouped into gene clusters; (3) an optional step for aligning short-reads is provided, to validate split-sites by identifying unique junction reads, and revealing and quantifying tissue-specific alternative splice isoforms. We tested IsoSplitter AS prediction using datasets from multiple model and non-model plant species, and showed that IsoSplitter pipeline is efficient to handle different transcriptomes with high accuracy. Furthermore, we evaluated the IsoSplitter pipeline compared with that of the splice junction identification tools, Program to Assemble Spliced Alignments (PASA-software needs a reference genome for AS identification) and AStrap, using data from the model plant Arabidopsis thaliana. We found that, IsoSplitter determined more than twice as many AS events than AStrap analysis; and 94.13% of the IsoSplitter predicted AS events were also identified by the PASA analysis. Starting from a simple sequence file, IsoSplitter is an assembly-free tool for identification and characterization of AS. IsoSplitter is developed and implemented in Python 3.5 using the Linux platform and is freely available at https://github.com/Hengfu-Yin/IsoSplitter.

Keywords: Alternative Splicing; Gene Expression; Gene Structure; Sequence Alignment.