Scalable Prediction of Compound-protein Interaction on Compressed Molecular Fingerprints

Mol Inform. 2020 Jan;39(1-2):e1900130. doi: 10.1002/minf.201900130. Epub 2020 Jan 7.

Abstract

Prediction of compound-protein interactions with fingerprints has recently become challenging in recent pharmaceutical science for an efficient drug discovery. We review two scalable methods for predicting drug-protein interactions on fingerprints. Especially, we introduce two techniques of learning statistical models using lossless and lossy data compressions. The first one is a method using a trie representation of fingerprints which enables us to learn predictive models on the compressed format. The second one is a method using lossy data compression called feature maps (FMs). Recently, quite a few numbers of FMs for kernel approximations have been proposed and minwise hashing, one method of this kind. has been applied to predictions of compound-protein interactions and shows an effectiveness of the method. Overall, we show learning statistical models on the compressed format is effective for predicting compound-protein interactions on a large-scale.

Keywords: Compound-protein interaction prediction; data compression; drug discovery.

Publication types

  • Review

MeSH terms

  • Algorithms
  • Humans
  • Models, Statistical
  • Pharmaceutical Preparations / chemistry*
  • Protein Interaction Maps
  • Proteins / chemistry*

Substances

  • Pharmaceutical Preparations
  • Proteins