A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities

J Comput Aided Mol Des. 2018 Feb;32(2):375-384. doi: 10.1007/s10822-017-0094-6. Epub 2017 Dec 26.

Abstract

Quantitative structure-activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine-protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.

Keywords: Feature selection; Fisher criterion; Graph Laplacian; QSAR models; Semi-supervised.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Databases, Chemical
  • Drug Design
  • Models, Molecular*
  • Molecular Structure
  • Phenol / metabolism
  • Protein Kinase Inhibitors / metabolism
  • Protein Serine-Threonine Kinases / antagonists & inhibitors
  • Quantitative Structure-Activity Relationship*
  • Research Design / statistics & numerical data*
  • rho-Associated Kinases / antagonists & inhibitors

Substances

  • Protein Kinase Inhibitors
  • Phenol
  • Protein Serine-Threonine Kinases
  • rho-Associated Kinases