pseudoQC: A Regression-Based Simulation Software for Correction and Normalization of Complex Metabolomics and Proteomics Datasets

Proteomics. 2019 Oct;19(19):e1900264. doi: 10.1002/pmic.201900264. Epub 2019 Sep 18.

Abstract

Various types of unwanted and uncontrollable signal variations in MS-based metabolomics and proteomics datasets severely disturb the accuracies of metabolite and protein profiling. Therefore, pooled quality control (QC) samples are often employed in quality management processes, which are indispensable to the success of metabolomics and proteomics experiments, especially in high-throughput cases and long-term projects. However, data consistency and QC sample stability are still difficult to guarantee because of the experimental operation complexity and differences between experimenters. To make things worse, numerous proteomics projects do not take QC samples into consideration at the beginning of experimental design. Herein, a powerful and interactive web-based software, named pseudoQC, is presented to simulate QC sample data for actual metabolomics and proteomics datasets using four different machine learning-based regression methods. The simulated data are used for correction and normalization of the two published datasets, and the obtained results suggest that nonlinear regression methods perform better than linear ones. Additionally, the above software is available as a web-based graphical user interface and can be utilized by scientists without a bioinformatics background. pseudoQC is open-source software and freely available at https://www.omicsolution.org/wukong/pseudoQC/.

Keywords: machine learning; metabolomics; proteomics; pseudo-quality control; regression.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cell Line
  • Computational Biology / methods*
  • Entropy
  • Humans
  • Internet
  • Metabolome
  • Metabolomics / methods*
  • Metabolomics / statistics & numerical data
  • Proteome / metabolism
  • Proteomics / methods*
  • Proteomics / statistics & numerical data
  • Reproducibility of Results
  • Software*

Substances

  • Proteome