ADis-QSAR: a machine learning model based on biological activity differences of compounds

J Comput Aided Mol Des. 2023 Sep;37(9):435-451. doi: 10.1007/s10822-023-00517-1. Epub 2023 Jun 29.

Abstract

Drug candidates identified by the pharmaceutical industry typically have unique structural characteristics to ensure they interact strongly and specifically with their biological targets. Identifying these characteristics is a key challenge for developing new drugs, and quantitative structure-activity relationship (QSAR) analysis has generally been used to perform this task. QSAR models with good predictive power improve the cost and time efficiencies invested in compound development. Generating these good models depends on how well differences between "active" and "inactive" compound groups can be conveyed to the model to be learned. Efforts to solve this difference issue have been made, including generating a "molecular descriptor" that compressively expresses the structural characteristics of compounds. From the same perspective, we succeeded in developing the Activity Differences-Quantitative Structure-Activity Relationship (ADis-QSAR) model by generating molecular descriptors that more explicitly convey features of the group through a pair system that performs direct connections between active and inactive groups. We used popular machine learning algorithms, such as Support Vector Machine, Random Forest, XGBoost and Multi-Layer Perceptron for model learning and evaluated the model using scores such as accuracy, area under curve, precision and specificity. The results showed that the Support Vector Machine performed better than the others. Notably, the ADis-QSAR model showed significant improvements in meaningful scores such as precision and specificity compared to the baseline model, even in datasets with dissimilar chemical spaces. This model reduces the risk of selecting false positive compounds, improving the efficiency of drug development.

Keywords: Machine learning; Molecular fingerprint; QSAR; Virtual screening.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Drug Development
  • Machine Learning*
  • Neural Networks, Computer
  • Quantitative Structure-Activity Relationship*
  • Support Vector Machine