Machine learning methods for multi-walled carbon nanotubes (MWCNT) genotoxicity prediction

Marianna Kotzabasaki; Iason Sotiropoulos; Costas Charitidis; Haralambos Sarimveis

doi:10.1039/d0na00600a

Machine learning methods for multi-walled carbon nanotubes (MWCNT) genotoxicity prediction

Nanoscale Adv. 2021 Apr 12;3(11):3167-3176. doi: 10.1039/d0na00600a. eCollection 2021 Jun 1.

Authors

Marianna Kotzabasaki¹, Iason Sotiropoulos¹, Costas Charitidis¹, Haralambos Sarimveis¹

Affiliation

¹ School of Chemical Engineering, National Technical University of Athens 9 Heroon Polytechneiou Street, Zografou Campus 15780 Athens Greece mariannako@chemeng.ntua.gr hsarimv@central.ntua.gr +30 2107723138 +30 2107723236 +302107723237.

Abstract

Multi-walled carbon nanotubes (MWCNTs) are made of multiple single-walled carbon nanotubes (SWCNTs) which are nested inside one another forming concentric cylinders. These nanomaterials are widely used in industrial and biomedical applications, due to their unique physicochemical characteristics. However, previous studies have shown that exposure to MWCNTs may lead to toxicity and some of the physicochemical properties of MWCNTs can influence their toxicological profiles. In silico modelling can be applied as a faster and less costly alternative to experimental (in vivo and in vitro) testing for the hazard characterization of MWCNTs. This study aims at developing a fully validated predictive nanoinformatics model based on statistical and machine learning approaches for the accurate prediction of genotoxicity of different types of MWCNTs. Towards this goal, a number of different computational workflows were designed, combining unsupervised (Principal Component Analysis, PCA) and supervised classification techniques (Support Vectors Machine, "SVM", Random Forest, "RF", Logistic Regression, "LR" and Naïve Bayes, "NB") and Bayesian optimization. The Recursive Feature Elimination (RFE) method was applied for selecting the most important variables. An RF model using only three features was selected as the most efficient for predicting the genotoxicity of MWCNTs, exhibiting 80% accuracy on external validation and high classification probabilities. The most informative features selected by the model were "Length", "Zeta average" and "Purity".