Model-driven data curation pipeline for LC-MS-based untargeted metabolomics

Metabolomics. 2023 Mar 1;19(3):15. doi: 10.1007/s11306-023-01976-1.

Abstract

Introduction: There is still no community consensus regarding strategies for data quality review in liquid chromatography mass spectrometry (LC-MS)-based untargeted metabolomics. Assessing the analytical robustness of data, which is relevant for inter-laboratory comparisons and reproducibility, remains a challenge despite the wide variety of tools available for data processing.

Objectives: The aim of this study was to provide a model to describe the sources of variation in LC-MS-based untargeted metabolomics measurements, to use it to build a comprehensive curation pipeline, and to provide quality assessment tools for data quality review.

Methods: Human serum samples (n=392) were analyzed by ultraperformance liquid chromatography coupled to high-resolution mass spectrometry (UPLC-HRMS) using an untargeted metabolomics approach. The pipeline and tools used to process this dataset were implemented as part of the open source, publicly available TidyMS Python-based package.

Results: The model was applied to understand data curation practices used by the metabolomics community. Sources of variation, which are often overlooked in untargeted metabolomic studies, were identified in the analysis. New tools were used to characterize certain types of variations.

Conclusion: The developed pipeline allowed confirming data robustness by comparing the experimental results with expected values predicted by the model. New quality control practices were introduced to assess the analytical quality of data.

Keywords: Data curation; Liquid chromatography; Mass spectrometry; Quality control practices.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromatography, Liquid
  • Data Curation*
  • Humans
  • Metabolomics*
  • Reproducibility of Results
  • Tandem Mass Spectrometry