Single-Point Extrapolation to the Complete Basis Set Limit through Deep Learning

Soren Holm; Pablo A Unzueta; Keiran Thompson; Todd J Martínez

doi:10.1021/acs.jctc.2c01298

Single-Point Extrapolation to the Complete Basis Set Limit through Deep Learning

J Chem Theory Comput. 2023 Jul 25;19(14):4474-4483. doi: 10.1021/acs.jctc.2c01298. Epub 2023 May 16.

Authors

Soren Holm^{1

2}, Pablo A Unzueta^{1

2}, Keiran Thompson^{1

2}, Todd J Martínez^{1

2}

Affiliations

¹ Department of Chemistry and The PULSE Institute, Stanford University,Stanford, California 94305, United States.
² SLAC National Accelerator Laboratory, Menlo Park, California 94024, United States.

PMID: 37192428
DOI: 10.1021/acs.jctc.2c01298

Abstract

Machine learning (ML) offers an attractive method for making predictions about molecular systems while circumventing the need to run expensive electronic structure calculations. Once trained on ab initio data, the promise of ML is to deliver accurate predictions of molecular properties that were previously computationally infeasible. In this work, we develop and train a graph neural network model to correct the basis set incompleteness error (BSIE) between a small and large basis set at the RHF and B3LYP levels of theory. Our results show that, when compared to fitting to the total potential, an ML model fitted to correct the BSIE is better at generalizing to systems not seen during training. We test this ability by training on single molecules while evaluating on molecular complexes. We also show that ensemble models yield better behaved potentials in situations where the training data is insufficient. However, even when only fitting to the BSIE, acceptable performance is only achieved when the training data sufficiently resemble the systems one wants to make predictions on. The test error of the final model trained to predict the difference between the cc-pVDZ and cc-pV5Z potential is 0.184 kcal/mol for the B3LYP density functional, and the ensemble model accurately reproduces the large basis set interaction energy curves on the S66x8 dataset.