Construction of An Oral Bioavailability Prediction Model Based on Machine Learning for Evaluating Molecular Modifications

J Pharm Sci. 2024 May;113(5):1155-1167. doi: 10.1016/j.xphs.2024.02.026. Epub 2024 Feb 29.

Abstract

Objective: This study aims to explore the impact of ADME on the Oral Bioavailability (OB) of drugs and to construct a machine learning model for OB prediction. The model is then applied to predict the OB of modified berberine and atenolol molecules to obtain structures with higher OB.

Methods: Initially, a drug OB database was established, and corresponding ADME characteristics were obtained. The relationship between ADME and OB was analyzed using machine learning, with Morgan fingerprints serving as molecular descriptors. Compounds from the database were input into Random Forest, XGBoost, CatBoost, and LightGBM machine learning models to train the OB 7prediction model and evaluate its performance. Subsequently, berberine and atenolol were modified using Chemdraw software with ten different substituents for mono-substitution, and chlorine atoms for a full range of double substitutions. The modified molecular structures were converted into the same format as the training set for OB prediction. The predicted OB values of the modified structures of berberine and atenolol were compared.

Results: An OB database of 386 drugs was obtained. It was found that smaller molecular weight and a higher number of rotatable bonds (ten or less) could potentially lead to higher OB. The four machine learning models were evaluated using MSE, R2 score, MAE, and MFE as metrics, with Random Forest performing the best. The models' predictions for the test set were particularly accurate when OB ranged from 30% to 90%. After mono-substitution and double substitution of berberine and atenolol, the OB of both drugs was significantly improved.

Conclusions: This study found that some ADME properties of molecules do not have an absolute impact on OB. The database played a decisive role in the process of the machine learning OB prediction model, and the performance of the model was evaluated based on predictions within a range of strong generalization ability. In most cases, mono-substitution and double substitution were beneficial for enhancing the OB of berberine and atenolol. In summary, this study successfully constructed a machine learning regression prediction model that can accurately predict drug OB, which can guide drug design to achieve higher OB to some extent.

Keywords: Atenolol; Berberine; Machine learning; Molecular modification; Oral bioavailability.

MeSH terms

  • Atenolol*
  • Berberine*
  • Biological Availability
  • Machine Learning
  • Software

Substances

  • Atenolol
  • Berberine