AutoMSR: Auto Molecular Structure Representation Learning for Multi-label Metabolic Pathway Prediction

IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3430-3439. doi: 10.1109/TCBB.2022.3198119. Epub 2023 Dec 25.

Abstract

It is significant to comprehend the relationship between metabolic pathway and molecular pathway for synthesizing new molecules, for instance optimizing drug metabolization. In bioinformatics fields, multi-label prediction of metabolic pathways is a typical manner to understand this relationship. Graph neural networks (GNNs) have become an effective method to extract molecular structure's features for multi-label prediction of metabolic pathways. Though GNNs can effectively capture structural features from molecular structure graphs, building a well-performed GNN model for a given molecular structure data set requires the manual design of the GNN architecture and fine-tuning of the hyperparameters, which are time-consuming and rely on expert experience. To address the above challenge, we design an end-to-end automatic molecular structure representation learning framework named AutoMSR that can design the optimal GNN model based on a given molecular structure data set without manual intervention. We propose a multi-seed age evolution (MSAE) search algorithm to identify the optimal GNN architecture from the GNN architecture subspace. For a given molecular structure data set, AutoMSR first uses MSAE to search the GNN architecture, and then it adopts a tree-structured parzen estimator to obtain the best hyperparameters in the hyperparameters subspace. Finally, AutoMSR automatically constructs the optimal GNN model based on the best GNN architecture and hyperparameters to extract the molecular structure features for multi-label metabolic pathway prediction. We test the performance of AutoMSR on the real data set KEGG. The experiment results show that AutoMSR outperforms baseline methods on different multi-label classification evaluation metrics.

MeSH terms

  • Algorithms*
  • Benchmarking*
  • Computational Biology
  • Metabolic Networks and Pathways
  • Molecular Structure