MERIT: Controlling Monte-Carlo error rate in large-scale Monte-Carlo hypothesis testing

Stat Med. 2024 Jan 30;43(2):279-295. doi: 10.1002/sim.9959. Epub 2023 Nov 14.

Abstract

The use of Monte-Carlo (MC) p $$ p $$ -values when testing the significance of a large number of hypotheses is now commonplace. In large-scale hypothesis testing, we will typically encounter at least some p $$ p $$ -values near the threshold of significance, which require a larger number of MC replicates than p $$ p $$ -values that are far from the threshold. As a result, some incorrect conclusions can be reached due to MC error alone; for hypotheses near the threshold, even a very large number (eg, 1 0 6 $$ 1{0}^6 $$ ) of MC replicates may not be enough to guarantee conclusions reached using MC p $$ p $$ -values. Gandy and Hahn (GH)6-8 have developed the only method that directly addresses this problem. They defined a Monte-Carlo error rate (MCER) to be the probability that any decisions on accepting or rejecting a hypothesis based on MC p $$ p $$ -values are different from decisions based on ideal p $$ p $$ -values; their method then makes decisions by controlling the MCER. Unfortunately, the GH method is frequently very conservative, often making no rejections at all and leaving a large number of hypotheses "undecided". In this article, we propose MERIT, a method for large-scale MC hypothesis testing that also controls the MCER but is more statistically efficient than the GH method. Through extensive simulation studies, we demonstrate that MERIT controls the MCER while making more decisions that agree with the ideal p $$ p $$ -values than GH does. We also illustrate our method by an analysis of gene expression data from a prostate cancer study.

Keywords: bootstrap; false discovery rate; high-dimensional; permutation; reproducibility.

MeSH terms

  • Computer Simulation
  • Humans
  • Monte Carlo Method
  • Probability
  • Research Design*