Harnessing Semi-Supervised Machine Learning to Automatically Predict Bioactivities of Per- and Polyfluoroalkyl Substances (PFASs)

Environ Sci Technol Lett. 2022 Aug 26;10(11):1017-1022. doi: 10.1021/acs.estlett.2c00530. eCollection 2023 Nov 14.

Abstract

Many per- and polyfluoroalkyl substances (PFASs) pose significant health hazards due to their bioactive and persistent bioaccumulative properties. However, assessing the bioactivities of PFASs is both time-consuming and costly due to the sheer number and expense of in vivo and in vitro biological experiments. To this end, we harnessed new unsupervised/semi-supervised machine learning models to automatically predict bioactivities of PFASs in various human biological targets, including enzymes, genes, proteins, and cell lines. Our semi-supervised metric learning models were used to predict the bioactivity of PFASs found in the recent Organisation of Economic Co-operation and Development (OECD) report list, which contains 4730 PFASs used in a broad range of industries and consumers. Our work provides the first semi-supervised machine learning study of structure-activity relationships for predicting possible bioactivities in a variety of PFAS species.