Machine learning to predict developmental neurotoxicity with high-throughput data from 2D bio-engineered tissues

Finn Kuusisto; Vitor Santos Costa; Zhonggang Hou; James Thomson; David Page; Ron Stewart

doi:10.1109/icmla.2019.00055

Machine learning to predict developmental neurotoxicity with high-throughput data from 2D bio-engineered tissues

Proc Int Conf Mach Learn Appl. 2019 Dec:2019:293-298. doi: 10.1109/icmla.2019.00055. Epub 2020 Feb 17.

Authors

Finn Kuusisto¹, Vitor Santos Costa², Zhonggang Hou¹, James Thomson^{1

3

4}, David Page⁵, Ron Stewart¹

Affiliations

¹ Morgridge Institute for Research, Regenerative Biology, Madison, WI, USA.
² University of Porto, Department of Computer Science, Porto, Portugal.
³ University of Wisconsin, Department of Cell and Regenerative Biology, Madison, WI, USA.
⁴ University of California, Department of Molecular, Cellular, and Developmental Biology, Santa Barbara, CA, USA.
⁵ Duke University, Department of Biostatistics and Bioinformatics, Durham, NC, USA.

Abstract

There is a growing need for fast and accurate methods for testing developmental neurotoxicity across several chemical exposure sources. Current approaches, such as in vivo animal studies, and assays of animal and human primary cell cultures, suffer from challenges related to time, cost, and applicability to human physiology. Prior work has demonstrated success employing machine learning to predict developmental neurotoxicity using gene expression data collected from human 3D tissue models exposed to various compounds. The 3D model is biologically similar to developing neural structures, but its complexity necessitates extensive expertise and effort to employ. By instead focusing solely on constructing an assay of developmental neurotoxicity, we propose that a simpler 2D tissue model may prove sufficient. We thus compare the accuracy of predictive models trained on data from a 2D tissue model with those trained on data from a 3D tissue model, and find the 2D model to be substantially more accurate. Furthermore, we find the 2D model to be more robust under stringent gene set selection, whereas the 3D model suffers substantial accuracy degradation. While both approaches have advantages and disadvantages, we propose that our described 2D approach could be a valuable tool for decision makers when prioritizing neurotoxicity screening.

Keywords: gene expression; machine learning; neurotoxicity; tissue model.

Grants and funding

UH3 TR000506/TR/NCATS NIH HHS/United States