A Python-Based Pipeline for Preprocessing LC-MS Data for Untargeted Metabolomics Workflows

Gabriel Riquelme; Nicolás Zabalegui; Pablo Marchi; Christina M Jones; María Eugenia Monge

doi:10.3390/metabo10100416

A Python-Based Pipeline for Preprocessing LC-MS Data for Untargeted Metabolomics Workflows

Metabolites. 2020 Oct 16;10(10):416. doi: 10.3390/metabo10100416.

Authors

Gabriel Riquelme^{1

2}, Nicolás Zabalegui^{1

2}, Pablo Marchi³, Christina M Jones⁴, María Eugenia Monge¹

Affiliations

¹ Centro de Investigaciones en Bionanociencias (CIBION), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Godoy Cruz 2390, Ciudad de Buenos Aires C1425FQD, Argentina.
² Departamento de Química Inorgánica Analítica y Química Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Buenos Aires C1428EGA, Argentina.
³ Facultad de Ingeniería, Universidad de Buenos Aires, Paseo Colón 850, Ciudad de Buenos Aires C1063ACV, Argentina.
⁴ Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899-8392, USA.

Abstract

Preprocessing data in a reproducible and robust way is one of the current challenges in untargeted metabolomics workflows. Data curation in liquid chromatography-mass spectrometry (LC-MS) involves the removal of biologically non-relevant features (retention time, m/z pairs) to retain only high-quality data for subsequent analysis and interpretation. The present work introduces TidyMS, a package for the Python programming language for preprocessing LC-MS data for quality control (QC) procedures in untargeted metabolomics workflows. It is a versatile strategy that can be customized or fit for purpose according to the specific metabolomics application. It allows performing quality control procedures to ensure accuracy and reliability in LC-MS measurements, and it allows preprocessing metabolomics data to obtain cleaned matrices for subsequent statistical analysis. The capabilities of the package are shown with pipelines for an LC-MS system suitability check, system conditioning, signal drift evaluation, and data curation. These applications were implemented to preprocess data corresponding to a new suite of candidate plasma reference materials developed by the National Institute of Standards and Technology (NIST; hypertriglyceridemic, diabetic, and African-American plasma pools) to be used in untargeted metabolomics studies in addition to NIST SRM 1950 Metabolites in Frozen Human Plasma. The package offers a rapid and reproducible workflow that can be used in an automated or semi-automated fashion, and it is an open and free tool available to all users.

Keywords: Python; data cleaning; data curation; preprocessing; quality control; reference materials; signal drift; system suitability; untargeted metabolomics.

Abstract

Grants and funding