Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery

Ignacio Ponzoni; Víctor Sebastián-Pérez; Carlos Requena-Triguero; Carlos Roca; María J Martínez; Fiorella Cravero; Mónica F Díaz; Juan A Páez; Ramón Gómez Arrayás; Javier Adrio; Nuria E Campillo

doi:10.1038/s41598-017-02114-3

Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery

Sci Rep. 2017 May 25;7(1):2403. doi: 10.1038/s41598-017-02114-3.

Authors

Affiliations

¹ Instituto de Ciencias e Ingeniería de la Computación (ICIC), Universidad Nacional del Sur-CONICET, San Andrés 800 - Campus Palihue, 8000, Bahía Blanca, Argentina. ip@cs.uns.edu.ar.
² Centro de Investigaciones Biológicas, Consejo Superior de Investigaciones Científicas (CSIC), Ramiro de Maeztu 9, 28040, Madrid, Spain.
³ Instituto de Ciencias e Ingeniería de la Computación (ICIC), Universidad Nacional del Sur-CONICET, San Andrés 800 - Campus Palihue, 8000, Bahía Blanca, Argentina.
⁴ Planta Piloto de Ingeniería Química (PLAPIQUI), Universidad Nacional del Sur-CONICET, Co. La Carrindanga km.7, CC 717, Bahía Blanca, Argentina.
⁵ Instituto de Química Médica, Consejo Superior de Investigaciones Científicas (CSIC), Juan de la Cierva 3, 28006, Madrid, Spain.
⁶ Departamento de Química Orgánica, Universidad Autónoma de Madrid (UAM). Cantoblanco, 28049, Madrid, Spain.
⁷ Institute for Advanced Research in Chemical Sciences (IAdChem), UAM, 28049, Madrid, Spain.
⁸ Centro de Investigaciones Biológicas, Consejo Superior de Investigaciones Científicas (CSIC), Ramiro de Maeztu 9, 28040, Madrid, Spain. nuria.campillo@csic.es.

Abstract

Quantitative structure-activity relationship modeling using machine learning techniques constitutes a complex computational problem, where the identification of the most informative molecular descriptors for predicting a specific target property plays a critical role. Two main general approaches can be used for this modeling procedure: feature selection and feature learning. In this paper, a performance comparative study of two state-of-art methods related to these two approaches is carried out. In particular, regression and classification models for three different issues are inferred using both methods under different experimental scenarios: two drug-like properties, such as blood-brain-barrier and human intestinal absorption, and enantiomeric excess, as a measurement of purity used for chiral substances. Beyond the contrastive analysis of feature selection and feature learning methods as competitive approaches, the hybridization of these strategies is also evaluated based on previous results obtained in material sciences. From the experimental results, it can be concluded that there is not a clear winner between both approaches because the performance depends on the characteristics of the compound databases used for modeling. Nevertheless, in several cases, it was observed that the accuracy of the models can be improved by combining both approaches when the molecular descriptor sets provided by feature selection and feature learning contain complementary information.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Blood-Brain Barrier / drug effects
Blood-Brain Barrier / metabolism
Chemical Phenomena
Drug Discovery* / methods
Humans
Intestinal Absorption / drug effects
Machine Learning*
Models, Molecular*
Quantitative Structure-Activity Relationship*
Software