Learning to Make Chemical Predictions: the Interplay of Feature Representation, Data, and Machine Learning Methods

Chem. 2020 Jul 9;6(7):1527-1542. doi: 10.1016/j.chempr.2020.05.014. Epub 2020 Jun 16.

Abstract

Recently supervised machine learning has been ascending in providing new predictive approaches for chemical, biological and materials sciences applications. In this Perspective we focus on the interplay of machine learning method with the chemically motivated descriptors and the size and type of data sets needed for molecular property prediction. Using Nuclear Magnetic Resonance chemical shift prediction as an example, we demonstrate that success is predicated on the choice of feature extracted or real-space representations of chemical structures, whether the molecular property data is abundant and/or experimentally or computationally derived, and how these together will influence the correct choice of popular machine learning methods drawn from deep learning, random forests, or kernel methods.