A deep convolutional neural network for the estimation of gas chromatographic retention indices

J Chromatogr A. 2019 Dec 6:1607:460395. doi: 10.1016/j.chroma.2019.460395. Epub 2019 Jul 29.

Abstract

A deep convolutional neural network was used for the estimation of gas chromatographic retention indices on non-polar (polydimethylsiloxane and polydimethyl(5%-phenyl) siloxane) stationary phases. The neural network can be used for candidate ranking while searching a mass spectral database. A linear representation (SMILES notation) of the molecule structure was used as an input for the model. The input line was converted to a one-hot matrix and then directly processed by the neural network. The calculation of any common molecular descriptors is avoided, following the modern tendency in machine learning: to allow the neural network to find the most preferable features by itself instead of using hard-coded features. The model has two 1D-convolutional layers with 120 neurons each followed by a pooling layer and a fully-connected layer with 200 hidden neurons. The model was compared with state-of-the-art models for prediction of gas chromatographic indices based on molecular descriptors and on functional groups contributions. On different data sets better accuracy is shown together with greater versatility. The applicability to diverse sets of flavors and fragrances, essential oils, metabolites is shown. The possibility of using the model for improvement of mass spectral identification (without reference retention index) is demonstrated. The median absolute error and the median percentage error are in the range of 17.3 (0.93%) to 38.1 (2.15%) depending on used test data set. Ready-to-use neural network parameters are provided.

Keywords: Convolutional neural network; Deep learning; Gas chromatography; Non-target analysis; Quantitative structure-retention relationship; Retention index prediction.

MeSH terms

  • Chromatography, Gas / methods*
  • Databases, Factual
  • Gas Chromatography-Mass Spectrometry
  • Neural Networks, Computer*
  • Regression Analysis