Statistics in Proteomics: A Meta-analysis of 100 Proteomics Papers Published in 2019

David C L Handler; Paul A Haynes

doi:10.1021/jasms.9b00142

Statistics in Proteomics: A Meta-analysis of 100 Proteomics Papers Published in 2019

J Am Soc Mass Spectrom. 2020 Jul 1;31(7):1337-1343. doi: 10.1021/jasms.9b00142. Epub 2020 May 1.

Authors

David C L Handler¹, Paul A Haynes¹

Affiliation

¹ Department of Molecular Sciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW 2109, Australia.

PMID: 32324388
DOI: 10.1021/jasms.9b00142

Abstract

We randomly selected 100 journal articles published in five proteomics journals in 2019 and manually examined each of them against a set of 13 criteria concerning the statistical analyses used, all of which were based on items mentioned in the journals' instructions to authors. This included questions such as whether a pilot study was conducted and whether false discovery rate calculation was employed at either the quantitation or identification stage. These data were then transformed to binary inputs, analyzed via machine learning algorithms, and classified accordingly, with the aim of determining if clusters of data existed for specific journals or if certain statistical measures correlated with each other. We applied a variety of classification methods including principal component analysis decomposition, agglomerative clustering, and multinomial and Bernoulli naïve Bayes classification and found that none of these could readily determine journal identity given extracted statistical features. Logistic regression was useful in determining high correlative potential between statistical features such as false discovery rate criteria and multiple testing corrections methods, but was similarly ineffective at determining correlations between statistical features and specific journals. This meta-analysis highlights that there is a very wide variety of approaches being used in statistical analysis of proteomics data, many of which do not conform to published journal guidelines, and that contrary to implicit assumptions in the field there are no clear correlations between statistical methods and specific journals.

Publication types

Meta-Analysis

MeSH terms

Biomedical Research
Humans
Proteomics*
Statistics as Topic*