Quantum chemistry-machine learning approach for predicting and elucidating molecular hyperpolarizability: Application to [2.2]paracyclophane-containing push-pull polymers

J Chem Phys. 2021 Mar 28;154(12):124107. doi: 10.1063/5.0040342.

Abstract

Nonlinear optical properties of organic chromophores are of great interest in diverse photonic and optoelectronic applications. To elucidate general trends in the behaviors of molecules, large amounts of data are required. Therefore, both an accurate and a rapid computational approach can significantly promote the theoretical design of molecules. In this work, we combined quantum chemistry and machine learning (ML) to study the first hyperpolarizability (β) in [2.2]paracyclophane-containing push-pull compounds with various terminal donor/acceptor pairs and molecular lengths. To generate reference β values for ML, the ab initio elongation finite-field method was used, allowing us to treat long polymer chains with linear scale efficiency and high computational accuracy. A neural network (NN) model was built for β prediction, and the relevant molecular descriptors were selected using a genetic algorithm. The established NN model accurately reproduced the β values (R2 > 0.99) of long molecules based on the input quantum chemical properties (dipole moment, frontier molecular orbitals, etc.) of only the shortest systems and additional information about the actual system length. To obtain general trends in molecular descriptor-target property relationships learned by the NN, three approaches for explaining the ML decisions (i.e., partial dependence, accumulated local effects, and permutation feature importance) were used. The effect of donor/acceptor alternation on β in the studied systems was examined. The asymmetric extension of molecular regions end-capped with donors and acceptors produced unequal β responses. The results revealed how the electronic properties originating from the nature of substituents on the microscale controlled the magnitude of β according to the NN approximation. The applied approach facilitates the conceptual discoveries in chemistry by using ML to both (i) efficiently generate data and (ii) provide a source of information about causal correlations among system properties.