In Silico Prediction of the Toxicity of Nitroaromatic Compounds: Application of Ensemble Learning QSAR Approach

Toxics. 2022 Dec 1;10(12):746. doi: 10.3390/toxics10120746.

Abstract

In this work, a dataset of more than 200 nitroaromatic compounds is used to develop Quantitative Structure-Activity Relationship (QSAR) models for the estimation of in vivo toxicity based on 50% lethal dose to rats (LD50). An initial set of 4885 molecular descriptors was generated and applied to build Support Vector Regression (SVR) models. The best two SVR models, SVR_A and SVR_B, were selected to build an Ensemble Model by means of Multiple Linear Regression (MLR). The obtained Ensemble Model showed improved performance over the base SVR models in the training set (R2 = 0.88), validation set (R2 = 0.95), and true external test set (R2 = 0.92). The models were also internally validated by 5-fold cross-validation and Y-scrambling experiments, showing that the models have high levels of goodness-of-fit, robustness and predictivity. The contribution of descriptors to the toxicity in the models was assessed using the Accumulated Local Effect (ALE) technique. The proposed approach provides an important tool to assess toxicity of nitroaromatic compounds, based on the ensemble QSAR model and the structural relationship to toxicity by analyzed contribution of the involved descriptors.

Keywords: Accumulated Local Effect; QSAR; QSTR; ensemble model; machine learning; nitroaromatic compounds; support vector machine; toxicity.