Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data

Genome Biol. 2023 Apr 24;24(1):91. doi: 10.1186/s13059-023-02923-y.

Abstract

Long-read RNA sequencing (lrRNA-seq) produces detailed information about full-length transcripts, including novel and sample-specific isoforms. Furthermore, there is an opportunity to call variants directly from lrRNA-seq data. However, most state-of-the-art variant callers have been developed for genomic DNA. Here, there are two objectives: first, we perform a mini-benchmark on GATK, DeepVariant, Clair3, and NanoCaller primarily on PacBio Iso-Seq, data, but also on Nanopore and Illumina RNA-seq data; second, we propose a pipeline to process spliced-alignment files, making them suitable for variant calling with DNA-based callers. With such manipulations, high calling performance can be achieved using DeepVariant on Iso-seq data.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Exome Sequencing
  • High-Throughput Nucleotide Sequencing*
  • RNA*
  • RNA-Seq
  • Sequence Analysis, RNA

Substances

  • RNA