Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies

J Chem Theory Comput. 2013 Aug 13;9(8):3404-19. doi: 10.1021/ct400195d. Epub 2013 Jul 30.

Abstract

The accurate and reliable prediction of properties of molecules typically requires computationally intensive quantum-chemical calculations. Recently, machine learning techniques applied to ab initio calculations have been proposed as an efficient approach for describing the energies of molecules in their given ground-state structure throughout chemical compound space (Rupp et al. Phys. Rev. Lett. 2012, 108, 058301). In this paper we outline a number of established machine learning techniques and investigate the influence of the molecular representation on the methods performance. The best methods achieve prediction errors of 3 kcal/mol for the atomization energies of a wide variety of molecules. Rationales for this performance improvement are given together with pitfalls and challenges when applying machine learning approaches to the prediction of quantum-mechanical observables.