Modeling and Predicting the Activities of Trans-Acting Splicing Factors with Machine Learning

Cell Syst. 2018 Nov 28;7(5):510-520.e4. doi: 10.1016/j.cels.2018.09.002. Epub 2018 Nov 7.

Abstract

Alternative splicing (AS) is generally regulated by trans-splicing factors that specifically bind to cis-elements in pre-mRNAs. The human genome encodes ∼1,500 RNA binding proteins (RBPs) that potentially regulate AS, yet their functions remain largely unknown. To explore their potential activities, we fused the putative functional domains of RBPs to a sequence-specific RNA-binding domain and systemically analyzed how these engineered factors affect splicing. We discovered that ∼80% of low-complexity domains in endogenous RBPs displayed distinct context-dependent activities in regulating splicing, indicating that AS is under more extensive regulation than previously expected. We developed a machine learning approach to classify and predict the activities of RBPs based on their sequence compositions and further validated this model using endogenous RBPs and synthetic polypeptides. These results represent a systematic inspection, modeling, prediction, and validation of how RBP sequences affect their activities in controlling splicing, paving the way for de novo engineering of artificial splicing factors.

Keywords: RNA binding domains; alternative splicing; machine learning; protein activity prediction; protein engineering; splicing factors.

Publication types

  • Research Support, N.I.H., Intramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology
  • Humans
  • Machine Learning*
  • Models, Genetic*
  • RNA Splicing Factors / metabolism*
  • RNA Splicing*

Substances

  • RNA Splicing Factors