StatsPro: Systematic integration and evaluation of statistical approaches for detecting differential expression in label-free quantitative proteomics

J Proteomics. 2022 Jan 6:250:104386. doi: 10.1016/j.jprot.2021.104386. Epub 2021 Sep 30.

Abstract

Quantitative label-free mass spectrometry (MS) is an increasingly powerful technology for profiling thousands of proteins from complex biological samples. One of the primary goals of analyses performed on such proteomics data is to detect differentially expressed proteins (DEPs) under different experimental conditions. Many statistical methods have been developed and assessed for DEP detection in various proteomics studies. However, it remains a challenge for many proteomics scientists to choose an appropriate statistical procedure. Therefore, in this study, we organized 12 common testing algorithms and 6 P-value combination methods and further provided Cohen's d effect size for every protein and three evaluation criteria to help proteomics scientists investigate their influence on DEP detection in a systematic manner. To promote the widespread use of these methods, we developed a user-friendly web tool, StatsPro, and presented two case studies involving label-free quantitative proteomics data obtained using data-dependent acquisition and data-independent acquisition to illustrate its practicability. This tool is freely available in our GitHub repository (https://github.com/YanglabWCH/StatsPro/). SIGNIFICANCE: One of the primary goals of analyses performed on liquid chromatography-mass spectrometry (LC-MS) based proteomics data is to detect differentially expressed proteins (DEPs) under different experimental conditions. Despite of many research efforts have been proposed to detect DEPs, to date, there is a scarcity of efficient, systematic, and easy-to-handle tools that are tailored for proteomics scientists to choose an appropriate statistical procedure. Herein, we present a new tool, StatsPro, to enable implementation and evaluation of different statistical methods for proteomics scientists. This tool has two significant advances compared to existing software: a) It integrates up to 18 common statistical approaches (12 statistical tests and 6 P-value combination strategies) and performs Cohen's d effect size systematically for users, moreover, it provides a web-based interface and can be quite conveniently operated by users, even those with less profound computational background. b) It supports three performance evaluation criteria (e.g. number of DEPs, correlation coefficient between P-values and effect sizes, Area under the ROC curve) for users to review the final statistical results, which may guide the method selection for DEPs detection.

Keywords: Differentially expressed proteins; Label-free analysis; Proteomics; Statistical approaches; Systematic software.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromatography, Liquid / methods
  • Mass Spectrometry / methods
  • Proteome* / analysis
  • Proteomics* / methods
  • Software

Substances

  • Proteome