Querying multiple sets of P-values through composed hypothesis testing

Tristan Mary-Huard; Sarmistha Das; Indranil Mukhopadhyay; Stéphane Robin

doi:10.1093/bioinformatics/btab592

Querying multiple sets of P-values through composed hypothesis testing

Bioinformatics. 2021 Dec 22;38(1):141-148. doi: 10.1093/bioinformatics/btab592.

Authors

Tristan Mary-Huard^{1

2}, Sarmistha Das³, Indranil Mukhopadhyay³, Stéphane Robin^{1

4}

Affiliations

¹ Mathématiques et informatique appliqués (MIA)-Paris, INRAE, AgroParisTech, Université Paris-Saclay, Paris 75231, France.
² Génétique Quantitative et Evolution (GQE)-Le Moulon, Universite Paris-Saclay, INRAE, CNRS, AgroParisTech, Gif-sur-Yvette 91190, France.
³ Human Genetics Unit, Indian Statistical Institute, Kolkata 700108, India.
⁴ Centre d'Écologie et des Sciences de la Conservation (CESCO), MNHN, CNRS, Sorbonne Université, Paris 75005, France.

PMID: 34478490
DOI: 10.1093/bioinformatics/btab592

Abstract

Motivation: Combining the results of different experiments to exhibit complex patterns or to improve statistical power is a typical aim of data integration. The starting point of the statistical analysis often comes as a set of P-values resulting from previous analyses, that need to be combined flexibly to explore complex hypotheses, while guaranteeing a low proportion of false discoveries.

Results: We introduce the generic concept of composed hypothesis, which corresponds to an arbitrary complex combination of simple hypotheses. We rephrase the problem of testing a composed hypothesis as a classification task and show that finding items for which the composed null hypothesis is rejected boils down to fitting a mixture model and classifying the items according to their posterior probabilities. We show that inference can be efficiently performed and provide a thorough classification rule to control for type I error. The performance and the usefulness of the approach are illustrated in simulations and on two different applications. The method is scalable, does not require any parameter tuning, and provided valuable biological insight on the considered application cases.

Availability and implementation: The QCH methodology is available in the qch package hosted on CRAN. Additionally, R codes to reproduce the Einkorn example are available on the personal webpage of the first author: https://www6.inrae.fr/mia-paris/Equipes/Membres/Tristan-Mary-Huard.

Supplementary information: Supplementary data are available at Bioinformatics online.

Querying multiple sets of P-values through composed hypothesis testing

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding