L-RAPiT: A Cloud-Based Computing Pipeline for the Analysis of Long-Read RNA Sequencing Data

Int J Mol Sci. 2022 Dec 13;23(24):15851. doi: 10.3390/ijms232415851.

Abstract

Long-read sequencing (LRS) has been adopted to meet a wide variety of research needs, ranging from the construction of novel transcriptome annotations to the rapid identification of emerging virus variants. Amongst other advantages, LRS preserves more information about RNA at the transcript level than conventional high-throughput sequencing, including far more accurate and quantitative records of splicing patterns. New studies with LRS datasets are being published at an exponential rate, generating a vast reservoir of information that can be leveraged to address a host of different research questions. However, mining such publicly available data in a tailored fashion is currently not easy, as the available software tools typically require familiarity with the command-line interface, which constitutes a significant obstacle to many researchers. Additionally, different research groups utilize different software packages to perform LRS analysis, which often prevents a direct comparison of published results across different studies. To address these challenges, we have developed the Long-Read Analysis Pipeline for Transcriptomics (L-RAPiT), a user-friendly, free pipeline requiring no dedicated computational resources or bioinformatics expertise. L-RAPiT can be implemented directly through Google Colaboratory, a system based on the open-source Jupyter notebook environment, and allows for the direct analysis of transcriptomic reads from Oxford Nanopore and PacBio LRS machines. This new pipeline enables the rapid, convenient, and standardized analysis of publicly available or newly generated LRS datasets.

Keywords: LINC00173; Oxford Nanopore; PacBio; RNA sequencing; RNA-seq; alternative splicing; bioinformatics; computational genomics; long-read sequencing; next-generation sequencing; software.

MeSH terms

  • Cloud Computing*
  • Computational Biology / methods
  • Gene Expression Profiling / methods
  • High-Throughput Nucleotide Sequencing / methods
  • RNA* / genetics
  • Sequence Analysis, RNA
  • Software

Substances

  • RNA