ILGBMSH: an interpretable classification model for the shRNA target prediction with ensemble learning algorithm

Brief Bioinform. 2022 Nov 19;23(6):bbac429. doi: 10.1093/bib/bbac429.

Abstract

Short hairpin RNA (shRNA)-mediated gene silencing is an important technology to achieve RNA interference, in which the design of potent and reliable shRNA molecules plays a crucial role. However, efficient shRNA target selection through biological technology is expensive and time consuming. Hence, it is crucial to develop a more precise and efficient computational method to design potent and reliable shRNA molecules. In this work, we present an interpretable classification model for the shRNA target prediction using the Light Gradient Boosting Machine algorithm called ILGBMSH. Rather than utilizing only the shRNA sequence feature, we extracted 554 biological and deep learning features, which were not considered in previous shRNA prediction research. We evaluated the performance of our model compared with the state-of-the-art shRNA target prediction models. Besides, we investigated the feature explanation from the model's parameters and interpretable method called Shapley Additive Explanations, which provided us with biological insights from the model. We used independent shRNA experiment data from other resources to prove the predictive ability and robustness of our model. Finally, we used our model to design the miR30-shRNA sequences and conducted a gene knockdown experiment. The experimental result was perfectly in correspondence with our expectation with a Pearson's coefficient correlation of 0.985. In summary, the ILGBMSH model can achieve state-of-the-art shRNA prediction performance and give biological insights from the machine learning model parameters.

Keywords: deep learning; ensemble learning; knockdown experiment; shRNA prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Machine Learning*
  • RNA, Small Interfering / genetics

Substances

  • RNA, Small Interfering