A regression framework for assessing covariate effects on the reproducibility of high-throughput experiments

Biometrics. 2018 Sep;74(3):803-813. doi: 10.1111/biom.12832. Epub 2017 Nov 29.

Abstract

The outcome of high-throughput biological experiments is affected by many operational factors in the experimental and data-analytical procedures. Understanding how these factors affect the reproducibility of the outcome is critical for establishing workflows that produce replicable discoveries. In this article, we propose a regression framework, based on a novel cumulative link model, to assess the covariate effects of operational factors on the reproducibility of findings from high-throughput experiments. In contrast to existing graphical approaches, our method allows one to succinctly characterize the simultaneous and independent effects of covariates on reproducibility and to compare reproducibility while controlling for potential confounding variables. We also establish a connection between our model and certain Archimedean copula models. This connection not only offers our regression framework an interpretation in copula models, but also provides guidance on choosing the functional forms of the regression. Furthermore, it also opens a new way to interpret and utilize these copulas in the context of reproducibility. Using simulations, we show that our method produces calibrated type I error and is more powerful in detecting difference in reproducibility than existing measures of agreement. We illustrate the usefulness of our method using a ChIP-seq study and a microarray study.

Keywords: Copula; Correspondence curve regression; Cumulative link model; Genomics; High-throughput experiment; Reproducibility.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Binding Sites
  • CCCTC-Binding Factor / chemistry
  • Calibration
  • Computer Simulation
  • Confounding Factors, Epidemiologic*
  • Gene Expression Profiling / statistics & numerical data
  • High-Throughput Screening Assays / standards
  • High-Throughput Screening Assays / statistics & numerical data*
  • Humans
  • Microarray Analysis / statistics & numerical data
  • Models, Statistical
  • Regression Analysis*
  • Reproducibility of Results

Substances

  • CCCTC-Binding Factor
  • CTCF protein, human