Machine Learning Approaches for the Prioritization of Genomic Variants Impacting Pre-mRNA Splicing

Cells. 2019 Nov 26;8(12):1513. doi: 10.3390/cells8121513.

Abstract

Defects in pre-mRNA splicing are frequently a cause of Mendelian disease. Despite the advent of next-generation sequencing, allowing a deeper insight into a patient's variant landscape, the ability to characterize variants causing splicing defects has not progressed with the same speed. To address this, recent years have seen a sharp spike in the number of splice prediction tools leveraging machine learning approaches, leaving clinical geneticists with a plethora of choices for in silico analysis. In this review, some basic principles of machine learning are introduced in the context of genomics and splicing analysis. A critical comparative approach is then used to describe seven recent machine learning-based splice prediction tools, revealing highly diverse approaches and common caveats. We find that, although great progress has been made in producing specific and sensitive tools, there is still much scope for personalized approaches to prediction of variant impact on splicing. Such approaches may increase diagnostic yields and underpin improvements to patient care.

Keywords: Mendelian disease; RNA splicing; bioinformatics; diagnostics; effect prediction; genomic medicine; machine learning; variant interpretation; variant prioritization.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Computational Biology / methods*
  • Genetic Predisposition to Disease
  • Genetic Variation*
  • Genomics / methods*
  • Humans
  • Machine Learning*
  • Models, Biological
  • Molecular Sequence Annotation
  • RNA Precursors / genetics*
  • RNA Splicing*

Substances

  • RNA Precursors