Physics-Informed Neural Networks with Group Contribution Methods

J Chem Theory Comput. 2023 Jul 11;19(13):4163-4171. doi: 10.1021/acs.jctc.3c00195. Epub 2023 Jun 9.

Abstract

Thermophysical properties of organic compounds are used in countless scientific, engineering, and industrial settings in developing theories, designing new systems and devices, analyzing costs and risks, and improving existing infrastructure. Often, due to costs, safety, prior interest, or procedural difficulties, experimental values for desired properties are not available and must be predicted. The literature is filled with prediction techniques, but even the best traditional methods have significant errors compared to what is possible considering experimental uncertainty. Recently, machine learning and artificial intelligence techniques have been applied to the property prediction problem, but the examples to date do not extrapolate well outside the data set used for training the model. This work demonstrates a solution to this problem by combining chemistry and physics when training the model and builds upon prior traditional and machine learning methods. Two case studies are presented. The first is for parachor which is used for surface tension prediction. Surface tensions are needed to design distillation columns, adsorption processes, gas-liquid reactors, liquid-liquid extractors, improve oil reservoir recovery, and undertake environmental impact studies or remediation actions. A set of 277 compounds is divided into training, validation, and test sets, and a multilayered physics-informed neural network (PINN) is developed. The results demonstrate that better extrapolation by deep learning models can be developed by adding in physics-based constraints. Second, a set of 1600 compounds is utilized for training, validating, and testing a PINN to improve normal boiling point predictions based on group contribution methods and physics-based constraints. The results show that the PINN performs better than any other method with a normal boiling point mean absolute error of 6.95 °C on training and 11.2 °C on test data. Key observations are that (1) a balanced split by compound type is important to have representative compound families in each of the train, validation, and test sets and (2) constraining group contributions being positive improves predictions on the test set. While this work demonstrates improvements for only surface tension and normal boiling point, the results offer significant hope that PINNs can improve prediction of other relevant thermophysical properties over existing approaches.