Augmented drug combination dataset to improve the performance of machine learning models predicting synergistic anticancer effects

Sci Rep. 2024 Jan 18;14(1):1668. doi: 10.1038/s41598-024-51940-9.

Abstract

Combination therapy has gained popularity in cancer treatment as it enhances the treatment efficacy and overcomes drug resistance. Although machine learning (ML) techniques have become an indispensable tool for discovering new drug combinations, the data on drug combination therapy currently available may be insufficient to build high-precision models. We developed a data augmentation protocol to unbiasedly scale up the existing anti-cancer drug synergy dataset. Using a new drug similarity metric, we augmented the synergy data by substituting a compound in a drug combination instance with another molecule that exhibits highly similar pharmacological effects. Using this protocol, we were able to upscale the AZ-DREAM Challenges dataset from 8798 to 6,016,697 drug combinations. Comprehensive performance evaluations show that ML models trained on the augmented data consistently achieve higher accuracy than those trained solely on the original dataset. Our data augmentation protocol provides a systematic and unbiased approach to generating more diverse and larger-scale drug combination datasets, enabling the development of more precise and effective ML models. The protocol presented in this study could serve as a foundation for future research aimed at discovering novel and effective drug combinations for cancer treatment.

MeSH terms

  • Computational Biology* / methods
  • Drug Combinations
  • Drug Synergism
  • Drug Therapy, Combination
  • Machine Learning*

Substances

  • Drug Combinations