Overall assessment for selected markers from high-throughput data

Woojoo Lee; Donghwan Lee; Yudi Pawitan

doi:10.1002/sim.9596

Overall assessment for selected markers from high-throughput data

Stat Med. 2022 Dec 30;41(30):5830-5843. doi: 10.1002/sim.9596. Epub 2022 Oct 21.

Authors

Woojoo Lee¹, Donghwan Lee², Yudi Pawitan³

Affiliations

¹ Department of Public Health Science, Graduate School of Public Health, Seoul National University, Seoul, Republic of Korea.
² Department of Statistics, Ewha Womans University, Seoul, Republic of Korea.
³ Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.

PMID: 36270585
DOI: 10.1002/sim.9596

Abstract

Reproducibility, a hallmark of science, is typically assessed in validation studies. We focus on high-throughput studies where a large number of biomarkers is measured in a training study, but only a subset of the most significant findings is selected and re-tested in a validation study. Our aim is to get the statistical measures of overall assessment for the selected markers, by integrating the information in both the training and validation studies. Naive statistical measures, such as the combined $P$ -value by conventional meta-analysis, that ignore the non-random selection are clearly biased, producing over-optimistic significance. We use the false-discovery rate (FDR) concept to develop a selection-adjusted FDR (sFDR) as an overall assessment measure. We describe the link between the overall assessment and other concepts such as replicability and meta-analysis. Some simulation studies and two real metabolomic datasets are considered to illustrate the application of sFDR in high-throughput data analyses.

Keywords: false discovery rate; reproducibility; selection adjustment; validation study.

Publication types

Meta-Analysis
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Computer Simulation
Humans
Reproducibility of Results