Classifying large chemical data sets: using a regularized potential function method

J Chem Inf Model. 2011 Jan 24;51(1):4-14. doi: 10.1021/ci100022u. Epub 2010 Dec 15.

Abstract

In recent years classifiers generated with kernel-based methods, such as support vector machines (SVM), Gaussian processes (GP), regularization networks (RN), and binary kernel discrimination (BKD) have been very popular in chemoinformatics data analysis. Aizerman et al. were the first to introduce the notion of employing kernel-based classifiers in the area of pattern recognition. Their original scheme, which they termed the potential function method (PFM), can basically be viewed as a kernel-based perceptron procedure and arguably subsumes the modern kernel-based algorithms. PFM can be computationally much cheaper than modern kernel-based classifiers; furthermore, PFM is far simpler conceptually and easier to implement than the SVM, GP, and RN algorithms. Unfortunately, unlike, e.g., SVM, GP, and RN, PFM is not endowed with both theoretical guarantees and practical strategies to safeguard it against generating overfitting classifiers. This is, in our opinion, the reason why this simple and elegant method has not been taken up in chemoinformatics. In this paper we empirically address this drawback: while maintaining its simplicity, we demonstrate that PFM combined with a simple regularization scheme may yield binary classifiers that can be, in practice, as efficient as classifiers obtained by employing state-of-the-art kernel-based methods. Using a realistic classification example, the augmented PFM was used to generate binary classifiers. Using a large chemical data set, the generalization ability of PFM classifiers were then compared with the prediction power of Laplacian-modified naive Bayesian (LmNB), Winnow (WN), and SVM classifiers.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chemistry / methods*
  • Classification / methods*
  • Decision Making
  • Discriminant Analysis
  • Informatics / methods*
  • Nonlinear Dynamics