HIV-1 integrase (IN) is a promising target for anti-AIDS therapy, and LEDGF/p75 is proved to enhance the HIV-1 integrase strand transfer activity in vitro. Blocking the interaction between IN and LEDGF/p75 is an effective way to inhibit HIV replication infection. In this work, 274 LEDGF/p75-IN inhibitors were collected as the dataset. Support Vector Machine (SVM), Decision Tree (DT), Function Tree (FT) and Random Forest (RF) were applied to build several computational models for predicting whether a compound is an active or weakly active LEDGF/p75-IN inhibitor. Each compound is represented by MACCS fingerprints and CORINA Symphony descriptors. The prediction accuracies for the test sets of all the models are over 70 %. The best model Model 3B built by FT obtained a prediction accuracy and a Matthews Correlation Coefficient (MCC) of 81.08 % and 0.62 on test set, respectively. We found that the hydrogen bond and hydrophobic interactions are important for the bioactivity of an inhibitor.
Keywords: Classification model; Extended connectivity fingerprints (ECFP_4); HIV-1 integrase (IN) LEDGF/p75 inhibitor; Machine learning method.
© 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.