HLPI-Ensemble: Prediction of human lncRNA-protein interactions based on ensemble strategy

RNA Biol. 2018;15(6):797-806. doi: 10.1080/15476286.2018.1457935. Epub 2018 Jun 6.

Abstract

LncRNA plays an important role in many biological and disease progression by binding to related proteins. However, the experimental methods for studying lncRNA-protein interactions are time-consuming and expensive. Although there are a few models designed to predict the interactions of ncRNA-protein, they all have some common drawbacks that limit their predictive performance. In this study, we present a model called HLPI-Ensemble designed specifically for human lncRNA-protein interactions. HLPI-Ensemble adopts the ensemble strategy based on three mainstream machine learning algorithms of Support Vector Machines (SVM), Random Forests (RF) and Extreme Gradient Boosting (XGB) to generate HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble, respectively. The results of 10-fold cross-validation show that HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble achieved AUCs of 0.95, 0.96 and 0.96, respectively, in the test dataset. Furthermore, we compared the performance of the HLPI-Ensemble models with the previous models through external validation dataset. The results show that the false positives (FPs) of HLPI-Ensemble models are much lower than that of the previous models, and other evaluation indicators of HLPI-Ensemble models are also higher than those of the previous models. It is further showed that HLPI-Ensemble models are superior in predicting human lncRNA-protein interaction compared with previous models. The HLPI-Ensemble is publicly available at: http://ccsipb.lnu.edu.cn/hlpiensemble/ .

Keywords: bioinformatics; ensemble strategy; lncRNA; lncRNA-protein interaction; protein.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Nucleic Acid*
  • Humans
  • Models, Biological*
  • RNA, Long Noncoding* / genetics
  • RNA, Long Noncoding* / metabolism
  • RNA-Binding Proteins* / genetics
  • RNA-Binding Proteins* / metabolism
  • Sequence Analysis, RNA / methods*
  • Support Vector Machine*

Substances

  • RNA, Long Noncoding
  • RNA-Binding Proteins

Grants and funding

This work was supported by the Doctor Startup Foundation from Liaoning Province, 20170520217; Innovation Team Project of Education Department of Liaoning Province, LT2015011; Large-scale Equipment Shared Services Project, F15165400; Important Scientific and Technical Achievements Transformation Project, Z17-5-078; Applied Basic Research Key Project of Yunnan, F16205151; National Natural Science Foundation of China, 31570160 and 61772531.