Deep learning for retention time prediction in reversed-phase liquid chromatography

J Chromatogr A. 2022 Feb 8:1664:462792. doi: 10.1016/j.chroma.2021.462792. Epub 2021 Dec 30.

Abstract

Retention time prediction in high-performance liquid chromatography (HPLC) is the subject of many studies since it can improve the identification of unknown molecules in untargeted profiling using HPLC coupled with high-resolution mass spectrometry. Lots of approaches were developed for retention time prediction in liquid chromatography for a different number of molecules considering various molecular properties and machine learning algorithms. The recently built large retention time data set of standard compounds from the Metabolite and Chemical Entity Database (METLIN) allows researchers to create a model that can be used for retention time prediction of small molecules with wide varieties of structures and physicochemical properties. The ability to predict retention times using the largest data set was studied for different architectures of deep learning models that were trained on molecular fingerprints, and SMILES (string representation of a molecule) represented as one-hot matrices. The best result was achieved with a one-dimensional convolutional neural network (1D CNN) that uses SMILES as an input. The proposed model reached the mean absolute error and the median absolute error equal to 34.7 and 18.7 s, respectively, which outperformed the results previously obtained for this data set. The pre-trained 1D CNN on the METLIN SMRT data set was transferred on five other data sets to evaluate the generalization ability.

Keywords: Deep learning; RP-HPLC; Retention time prediction.

MeSH terms

  • Chromatography, Liquid
  • Chromatography, Reverse-Phase*
  • Deep Learning*
  • Machine Learning
  • Neural Networks, Computer