Evolving interpretable structure-activity relationship models. 2. Using multiobjective optimization to derive multiple models

J Chem Inf Model. 2008 Aug;48(8):1558-70. doi: 10.1021/ci800051h. Epub 2008 Jul 19.

Abstract

A multiobjective evolutionary algorithm (MOEA) is described for evolving multiple structure-activity relationships (SARs). The SARs are encoded in easy-to-interpret reduced graph queries which describe features that are preferentially present in active compounds compared to inactives. The MOEA addresses a limitation associated with many machine learning methods; that is, the inherent tradeoff that exists in recall and precision which is usually handled by combining the two objectives into a single measure with a consequent loss of control. By simultaneously optimizing recall and precision, the MOEA generates a family of SARs that lie on the precision-recall (PR) curve. The user is then able to select a query with an appropriate balance in the two objectives: for example, a low recall-high precision query may be preferred when establishing the SAR, whereas a high recall-low precision query may be more appropriate in a virtual screening context. Each query on the PR curve aims at capturing the structure-activity information into a single representation, and each can be considered as an alternative (equally valid) solution. We then investigate combining individual queries into teams with the aim of capturing multiple SARs that may exist in a data set, for example, as is commonly seen in high-throughput screening data sets. Team formation is carried out iteratively as a postprocessing step following the evolution of the individual queries. The inclusion of uniqueness as a third objective within the MOEA provides an effective way of ensuring the queries are complementary in the active compounds they describe. Substantial improvements in both recall and precision are seen for some data sets. Furthermore, the resulting queries provide more detailed structure-activity information than is present in a single query.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Humans
  • Models, Biological*
  • Molecular Structure
  • Receptors, Serotonin, 5-HT1 / metabolism
  • Serotonin 5-HT1 Receptor Agonists
  • Structure-Activity Relationship

Substances

  • Receptors, Serotonin, 5-HT1
  • Serotonin 5-HT1 Receptor Agonists