Retention time prediction in hydrophilic interaction liquid chromatography with graph neural network and transfer learning

J Chromatogr A. 2021 Oct 25:1656:462536. doi: 10.1016/j.chroma.2021.462536. Epub 2021 Sep 7.

Abstract

The combination of retention time (RT), accurate mass and tandem mass spectra can improve the structural annotation in untargeted metabolomics. However, the incorporation of RT for metabolite identification has received less attention because of the limitation of available RT data, especially for hydrophilic interaction liquid chromatography (HILIC). Here, the Graph Neural Network-based Transfer Learning (GNN-TL) is proposed to train a model for HILIC RTs prediction. The graph neural network was pre-trained using an in silico HILIC RT dataset (pseudo-labeling dataset) with ∼306 K molecules. Then, the weights of dense layers in the pre-trained GNN (pre-GNN) model were fine-tuned by transfer learning using a small number of experimental HILIC RTs from the target chromatographic system. The GNN-TL outperformed the methods in Retip, including the Random Forest (RF), Bayesian-regularized neural network (BRNN), XGBoost, light gradient-boosting machine (LightGBM), and Keras. It achieved the lowest mean absolute error (MAE) of 38.6 s on the test set and 33.4 s on an additional test set. It has the best ability to generalize with a small performance difference between training, test, and additional test sets. Furthermore, the predicted RTs can filter out nearly 60% false positive candidates on average, which is valuable for the identification of compounds complementary to mass spectrometry.

Keywords: Graph neural network; HILIC RT prediction; Pseudo-labeling; Transfer learning.

MeSH terms

  • Bayes Theorem
  • Chromatography, Liquid
  • Hydrophobic and Hydrophilic Interactions
  • Machine Learning
  • Neural Networks, Computer*
  • Tandem Mass Spectrometry*