Review of Issues and Solutions to Data Analysis Reproducibility and Data Quality in Clinical Proteomics

Methods Mol Biol. 2020:2051:345-371. doi: 10.1007/978-1-4939-9744-2_15.

Abstract

In any analytical discipline, data analysis reproducibility is closely interlinked with data quality. In this book chapter focused on mass spectrometry-based proteomics approaches, we introduce how both data analysis reproducibility and data quality can influence each other and how data quality and data analysis designs can be used to increase robustness and improve reproducibility. We first introduce methods and concepts to design and maintain robust data analysis pipelines such that reproducibility can be increased in parallel. The technical aspects related to data analysis reproducibility are challenging, and current ways to increase the overall robustness are multifaceted. Software containerization and cloud infrastructures play an important part.We will also show how quality control (QC) and quality assessment (QA) approaches can be used to spot analytical issues, reduce the experimental variability, and increase confidence in the analytical results of (clinical) proteomics studies, since experimental variability plays a substantial role in analysis reproducibility. Therefore, we give an overview on existing solutions for QC/QA, including different quality metrics, and methods for longitudinal monitoring. The efficient use of both types of approaches undoubtedly provides a way to improve the experimental reliability, reproducibility, and level of consistency in proteomics analytical measurements.

Keywords: Cloud technology; Computational mass spectrometry; Large scale data analysis; Quality control approaches; Reproducible analysis pipelines.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cloud Computing*
  • Data Accuracy
  • Data Analysis*
  • Humans
  • Mass Spectrometry
  • Proteomics / methods*
  • Quality Control*
  • Reproducibility of Results
  • Software