Compression of molecular fingerprints with autoencoder networks

Mol Inform. 2023 Jun;42(6):e2300059. doi: 10.1002/minf.202300059. Epub 2023 Jun 7.

Abstract

Several binary molecular fingerprints were compressed using an autoencoder neural network. We analyzed the impact of compression on fingerprint performance in downstream classification and regression tasks. Classifiers trained on compressed fingerprints were negligibly affected. Regression models benefitted from compression, especially of long fingerprints (Morgan, RDK). However, their performance dropped rapidly for compression levels exceeding 90 %. Property co-learning positively influenced the predictive power of the compressed fingerprints, with a mean score improvement up to 20 %, suggesting that autoencoder compression with property co-learning biases the molecular representation toward the predicted target, facilitating downstream training.

Keywords: QSAR; deep learning; entropy; latent space; neural network.

MeSH terms

  • Algorithms*
  • Machine Learning
  • Neural Networks, Computer*