AStrap: identification of alternative splicing from transcript sequences without a reference genome

Bioinformatics. 2019 Aug 1;35(15):2654-2656. doi: 10.1093/bioinformatics/bty1008.

Abstract

Summary: Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources.

Availability and implementation: AStrap is available for download at https://github.com/BMILAB/AStrap.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alternative Splicing*
  • Genome*
  • Humans
  • Machine Learning
  • Sequence Analysis, RNA
  • Transcriptome