The effect of leverage and/or influential on structure-activity relationships

Comb Chem High Throughput Screen. 2013 May;16(4):288-97. doi: 10.2174/1386207311316040003.

Abstract

In the spirit of reporting valid and reliable Quantitative Structure-Activity Relationship (QSAR) models, the aim of our research was to assess how the leverage (analysis with Hat matrix, h(i)) and the influential (analysis with Cook's distance, D(i)) of QSAR models may reflect the models reliability and their characteristics. The datasets included in this research were collected from previously published papers. Seven datasets which accomplished the imposed inclusion criteria were analyzed. Three models were obtained for each dataset (full-model, h(i)-model and D(i)-model) and several statistical validation criteria were applied to the models. In 5 out of 7 sets the correlation coefficient increased when compounds with either h(i) or D(i) higher than the threshold were removed. Withdrawn compounds varied from 2 to 4 for h(i)-models and from 1 to 13 for D(i)-models. Validation statistics showed that D(i)-models possess systematically better agreement than both full-models and h(i)-models. Removal of influential compounds from training set significantly improves the model and is recommended to be conducted in the process of quantitative structure-activity relationships developing. Cook's distance approach should be combined with hat matrix analysis in order to identify the compounds candidates for removal.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Linear Models
  • Models, Molecular
  • Quantitative Structure-Activity Relationship*
  • Reproducibility of Results