Estimating the local false discovery rate via a bootstrap solution to the reference class problem

Farnoosh Abbas-Aghababazadeh; Mayer Alvo; David R Bickel

doi:10.1371/journal.pone.0206902

Estimating the local false discovery rate via a bootstrap solution to the reference class problem

PLoS One. 2018 Nov 26;13(11):e0206902. doi: 10.1371/journal.pone.0206902. eCollection 2018.

Authors

Farnoosh Abbas-Aghababazadeh^{1

2}, Mayer Alvo¹, David R Bickel^{1

3}

Affiliations

¹ Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada.
² Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, United States of America.
³ Ottawa Institute of Systems Biology Biochemistry, Microbiology, and Immunology Department, University of Ottawa, Ottawa, Ontario, Canada.

Abstract

Methods of estimating the local false discovery rate (LFDR) have been applied to different types of datasets such as high-throughput biological data, diffusion tensor imaging (DTI), and genome-wide association (GWA) studies. We present a model for LFDR estimation that incorporates a covariate into each test. Incorporating the covariates may improve the performance of testing procedures, because it contains additional information based on the biological context of the corresponding test. This method provides different estimates depending on a tuning parameter. We estimate the optimal value of that parameter by choosing the one that minimizes the estimated LFDR resulting from the bias and variance in a bootstrap approach. This estimation method is called an adaptive reference class (ARC) method. In this study, we consider the performance of ARC method under certain assumptions on the prior probability of each hypothesis test as a function of the covariate. We prove that, under these assumptions, the ARC method has a mean squared error asymptotically no greater than that of the other method where the entire set of hypotheses is used and assuming a large covariate effect. In addition, we conduct a simulation study to evaluate the performance of estimator associated with the ARC method for a finite number of hypotheses. Here, we apply the proposed method to coronary artery disease (CAD) data taken from a GWA study and diffusion tensor imaging (DTI) data.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bias
Coronary Artery Disease / diagnostic imaging
Coronary Artery Disease / genetics
Data Interpretation, Statistical*
Datasets as Topic*
Diffusion Tensor Imaging / statistics & numerical data
Genome-Wide Association Study / statistics & numerical data
High-Throughput Screening Assays / statistics & numerical data
Humans
Probability

Abstract

Publication types

MeSH terms

Grants and funding