Benchmarking deep learning splice prediction tools using functional splice assays

Hum Mutat. 2021 Jul;42(7):799-810. doi: 10.1002/humu.24212. Epub 2021 May 20.

Abstract

Hereditary disorders are frequently caused by genetic variants that affect pre-messenger RNA splicing. Though genetic variants in the canonical splice motifs are almost always disrupting splicing, the pathogenicity of variants in the noncanonical splice sites (NCSS) and deep intronic (DI) regions are difficult to predict. Multiple splice prediction tools have been developed for this purpose, with the latest tools employing deep learning algorithms. We benchmarked established and deep learning splice prediction tools on published gold standard sets of 71 NCSS and 81 DI variants in the ABCA4 gene and 61 NCSS variants in the MYBPC3 gene with functional assessment in midigene and minigene splice assays. The selection of splice prediction tools included CADD, DSSP, GeneSplicer, MaxEntScan, MMSplice, NNSPLICE, SPIDEX, SpliceAI, SpliceRover, and SpliceSiteFinder-like. The best-performing splice prediction tool for the different variants was SpliceRover for ABCA4 NCSS variants, SpliceAI for ABCA4 DI variants, and the Alamut 3/4 consensus approach (GeneSplicer, MaxEntScacn, NNSPLICE and SpliceSiteFinder-like) for NCSS variants in MYBPC3 based on the area under the receiver operator curve. Overall, the performance in a real-time clinical setting is much more modest than reported by the developers of the tools.

Keywords: ABCA4; MYBPC3; RNA splicing; deep learning; splice prediction tools; variant effect prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • ATP-Binding Cassette Transporters / genetics
  • Benchmarking
  • Deep Learning*
  • Humans
  • Introns / genetics
  • Mutation
  • RNA Splice Sites / genetics
  • RNA Splicing

Substances

  • ABCA4 protein, human
  • ATP-Binding Cassette Transporters
  • RNA Splice Sites