The Role of Structural Representation in the Performance of a Deep Neural Network for X-Ray Spectroscopy

Molecules. 2020 Jun 11;25(11):2715. doi: 10.3390/molecules25112715.

Abstract

An important consideration when developing a deep neural network (DNN) for the prediction of molecular properties is the representation of the chemical space. Herein we explore the effect of the representation on the performance of our DNN engineered to predict Fe K-edge X-ray absorption near-edge structure (XANES) spectra, and address the question: How important is the choice of representation for the local environment around an arbitrary Fe absorption site? Using two popular representations of chemical space-the Coulomb matrix (CM) and pair-distribution/radial distribution curve (RDC)-we investigate the effect that the choice of representation has on the performance of our DNN. While CM and RDC featurisation are demonstrably robust descriptors, it is possible to obtain a smaller mean squared error (MSE) between the target and estimated XANES spectra when using RDC featurisation, and converge to this state a) faster and b) using fewer data samples. This is advantageous for future extension of our DNN to other X-ray absorption edges, and for reoptimisation of our DNN to reproduce results from higher levels of theory. In the latter case, dataset sizes will be limited more strongly by the resource-intensive nature of the underlying theoretical calculations.

Keywords: Coulomb matrix; K-edge.; X-ray absorption spectroscopy; XANES; deep neural network; machine learning; radial distribution curve.

MeSH terms

  • Computational Biology / methods*
  • Machine Learning
  • Models, Molecular
  • Neural Networks, Computer
  • X-Ray Absorption Spectroscopy