Evaluating molecular fingerprint-based models of drug side effects against a statistical control

Berk A Alpay; Mark Gosink; Derek Aguiar

doi:10.1016/j.drudis.2022.103364

Evaluating molecular fingerprint-based models of drug side effects against a statistical control

Drug Discov Today. 2022 Nov;27(11):103364. doi: 10.1016/j.drudis.2022.103364. Epub 2022 Sep 14.

Authors

Berk A Alpay¹, Mark Gosink², Derek Aguiar³

Affiliations

¹ Systems, Synthetic, and Quantitative Biology Program, Harvard University, Cambridge, MA 02138, USA. Electronic address: berk_alpay@g.harvard.edu.
² Pfizer Inc., Groton, CT 06340, USA.
³ Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA.

PMID: 36115633
DOI: 10.1016/j.drudis.2022.103364

Abstract

There are many machine learning models that use molecular fingerprints of drugs to predict side effects. Characterizing their skill is necessary for understanding their usefulness in pharmaceutical development. Here, we analyze a statistical control of side effect prediction skill, develop a pipeline for benchmarking models, and evaluate how well existing models predict side effects identified in pharmaceutical documentation. We demonstrate that molecular fingerprints are useful for ranking drugs by their likelihood to cause a given side effect. However, the predictions for one or more drugs overall benefit only marginally from molecular fingerprints when ranking the likelihoods of many possible side effects, and display at most modest overall skill at identifying the side effects that do and do not occur.

Keywords: Benchmarking; Chemical structure; Machine learning; Side effects.

Publication types

Review