Binary classification of imbalanced datasets using conformal prediction

J Mol Graph Model. 2017 Mar:72:256-265. doi: 10.1016/j.jmgm.2017.01.008. Epub 2017 Jan 6.

Abstract

Aggregated Conformal Prediction is used as an effective alternative to other, more complicated and/or ambiguous methods involving various balancing measures when modelling severely imbalanced datasets. Additional explicit balancing measures other than those already apart of the Conformal Prediction framework are shown not to be required. The Aggregated Conformal Prediction procedure appears to be a promising approach for severely imbalanced datasets in order to retrieve a large majority of active minority class compounds while avoiding information loss or distortion.

Keywords: Aggregated conformal prediction; Imbalanced datasets; QSAR; Signature descriptors; Support vector machines.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Chemical*
  • Molecular Conformation*
  • Support Vector Machine