A robust boosting regression tree with applications in quantitative structure-activity relationship studies of organic compounds

J Chem Inf Model. 2011 Apr 25;51(4):816-28. doi: 10.1021/ci100429u. Epub 2011 Mar 18.

Abstract

A regression tree (RT) was extensively utilized in quantitative structure-activity relationship studies (QSAR), due to its inherently promising attributes. The issues of instability and inclination to overfitting and suboptima, however, often occur in RT. In the present study, a robust version of boosting was invoked to simultaneously improve the stability and generalization ability of RT, forming a new method called robust boosting regression tree (RBRT). RBRT works by sequentially employing the RT method to model the robustly reweighted versions of the original training set and then aggregating these resultant predictors via weighted median. The designed RBRT was applied to predict the bioactivities of flavoniod derivatives and the anti-HIV activities of HIV-1 inhibitors, compared with boosting RT (BRT) and RT. The results of these two data sets demonstrated that the introduction of robust boosting drastically enhances the stability and generalization ability of RT, and RBRT is superior to BRT in QSAR studies.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Anti-HIV Agents / chemistry
  • Enzyme Inhibitors / chemistry*
  • Flavones / chemistry
  • Models, Molecular
  • Models, Statistical
  • Molecular Structure
  • Quantitative Structure-Activity Relationship*
  • Regression Analysis

Substances

  • Anti-HIV Agents
  • Enzyme Inhibitors
  • Flavones