Kernel Target Alignment Parameter: A New Modelability Measure for Regression Tasks

Gilles Marcou; Dragos Horvath; Alexandre Varnek

doi:10.1021/acs.jcim.5b00539

Kernel Target Alignment Parameter: A New Modelability Measure for Regression Tasks

J Chem Inf Model. 2016 Jan 25;56(1):6-11. doi: 10.1021/acs.jcim.5b00539. Epub 2015 Dec 23.

Authors

Gilles Marcou¹, Dragos Horvath¹, Alexandre Varnek^{1

2}

Affiliations

¹ Laboratory of Chemoinformatics, University of Strasbourg , 1 rue Blaise Pascal, 67000 Strasbourg, France.
² Laboratory of Chemoinformatics, Federal University of Kazan , Kremlevskaya str. 18, 420008 Kazan, Russia.

PMID: 26673976
DOI: 10.1021/acs.jcim.5b00539

Abstract

In this paper, we demonstrate that the kernel target alignment (KTA) parameter can efficiently be used to estimate the relevance of molecular descriptors for QSAR modeling on a given data set, i.e., as a modelability measure. The efficiency of KTA to assess modelability was demonstrated in two series of QSAR modeling studies, either varying different descriptor spaces for one same data set, or comparing various data sets within one same descriptor space. Considered data sets included 25 series of various GPCR binders with ChEMBL-reported pKi values, and a toxicity data set. Employed descriptor spaces covered more than 100 different ISIDA fragment descriptor types, and ChemAxon BCUT terms. Model performances (RMSE) were seen to anticorrelate consistently with the KTA parameter. Two other modelability measures were employed for benchmarking purposes: the Jaccard distance average over the data set (Div), and a measure related to the normalized mean absolute error (MAE) obtained in 1-nearest neighbors calculations on the training set (Sim = 1 - MAE). It has been demonstrated that both Div and Sim perform similarly to KTA. However, a consensus index combining KTA, Div and Sim provides a more robust correlation with RMSE than any of the individual modelability measures.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Drug Design
Models, Theoretical*
Quantitative Structure-Activity Relationship*
Regression Analysis
Serotonin / metabolism
Tetrahymena pyriformis / drug effects
Toxicity Tests

Substances

Serotonin