A deep-learning-based RNA-seq germline variant caller

Bioinform Adv. 2023 Jun 13;3(1):vbad062. doi: 10.1093/bioadv/vbad062. eCollection 2023.

Abstract

Summary: RNA sequencing (RNA-seq) can be applied to diverse tasks including quantifying gene expression, discovering quantitative trait loci and identifying gene fusion events. Although RNA-seq can detect germline variants, the complexities of variable transcript abundance, target capture and amplification introduce challenging sources of error. Here, we extend DeepVariant, a deep-learning-based variant caller, to learn and account for the unique challenges presented by RNA-seq data. Our DeepVariant RNA-seq model produces highly accurate variant calls from RNA-sequencing data, and outperforms existing approaches such as Platypus and GATK. We examine factors that influence accuracy, how our model addresses RNA editing events and how additional thresholding can be used to facilitate our models' use in a production pipeline.

Supplementary information: Supplementary data are available at Bioinformatics Advances online.