Purpose: One knowledge gap related to subcutaneous (SC) delivery is unpredictable and variable bioavailability. This study was aimed to develop machine learning methods to predict whether mAb's bioavailability was ≥70% or below, without completely knowing the mechanism and causality between inputs and outputs.
Methods: A database of mAb SC products was built. The model training and validation were accomplished based on this database and a set of the inputs (product properties) were mapped to the output (bioavailability) using different machine learning algorithms. Dimensionality reduction was undertaken using principal component analysis (PCA).
Results: The bioavailability of the mAb products being investigated varied from 35% to 90%. The tree-based methods, including random forest (RF), Adaptive Boost (AdaBoost), and decision tree (DT) presented the best predictability and generalization power on bioavailability classification. The models based on Multi-layer perceptron (MLP), Gaussian Naïve Bayes (GaussianNB), and k nearest neighbor (kNN) algorithms also provided acceptable prediction accuracy.
Conclusion: Machine learning could be a potential tool to predict mAb's bioavailability. Since all input features were acquired using theoretical calculations and predictions rather than experiments, the models may be particularly applicable to some early-stage research activities such as mAb molecule triage, design/optimization, mutant screening, molecule selection, and formulation design.
Keywords: bioavailability; machine learning; material attribute; monoclonal antibody; subcutaneous.