Assessing Graph-based Deep Learning Models for Predicting Flash Point

Xiaoyu Sun; Nathaniel J Krakauer; Alexander Politowicz; Wei-Ting Chen; Qiying Li; Zuoyi Li; Xianjia Shao; Alfred Sunaryo; Mingren Shen; James Wang; Dane Morgan

doi:10.1002/minf.201900101

Assessing Graph-based Deep Learning Models for Predicting Flash Point

Mol Inform. 2020 Jun;39(6):e1900101. doi: 10.1002/minf.201900101. Epub 2020 Feb 20.

Authors

Affiliation

¹ Dept. of Materials Science and Engineering, 244 MSE, University of Wisconsin, Madison, 53562.

PMID: 32077235
DOI: 10.1002/minf.201900101

Abstract

Flash points of organic molecules play an important role in preventing flammability hazards and large databases of measured values exist, although millions of compounds remain unmeasured. To rapidly extend existing data to new compounds many researchers have used quantitative structure-property relationship (QSPR) analysis to effectively predict flash points. In recent years graph-based deep learning (GBDL) has emerged as a powerful alternative method to traditional QSPR. In this paper, GBDL models were implemented in predicting flash point for the first time. We assessed the performance of two GBDL models, message-passing neural network (MPNN) and graph convolutional neural network (GCNN), by comparing against 12 previous QSPR studies using more traditional methods. Our result shows that MPNN both outperforms GCNN and yields slightly worse but comparable performance with previous QSPR studies. The average $R^{2}$ and Mean Absolute Error (MAE) scores of MPNN are, respectively, 2.3 % lower and 2.0 K higher than previous comparable studies. To further explore GBDL models, we collected the largest flash point dataset to date, which contains 10575 unique molecules. The optimized MPNN gives a test data $R^{2}$ of 0.803 and MAE of 17.8 K on the complete dataset. We also extracted 5 datasets from our integrated dataset based on molecular types (acids, organometallics, organogermaniums, organosilicons, and organotins) and explore the quality of the model in these classes.

Keywords: Domain of applicability; Flash point; Machine learning; Neural network; Quantitative structure-property relationship; Robust model prediction.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms*
Databases as Topic
Deep Learning*
Models, Theoretical*
Statistics as Topic

Associated data

figshare/10.6084/m9.figshare.9275210