Tests for comparison of multiple endpoints with application to omics data

Stat Appl Genet Mol Biol. 2018 Jan 30;17(1). doi: 10.1515/sagmb-2017-0033.

Abstract

In biomedical research, multiple endpoints are commonly analyzed in "omics" fields like genomics, proteomics and metabolomics. Traditional methods designed for low-dimensional data either perform poorly or are not applicable when analyzing high-dimensional data whose dimension is generally similar to, or even much larger than, the number of subjects. The complex biochemical interplay between hundreds (or thousands) of endpoints is reflected by complex dependence relations. The aim of the paper is to propose tests that are very suitable for analyzing omics data because they do not require the normality assumption, are powerful also for small sample sizes, in the presence of complex dependence relations among endpoints, and when the number of endpoints is much larger than the number of subjects. Unbiasedness and consistency of the tests are proved and their size and power are assessed numerically. It is shown that the proposed approach based on the nonparametric combination of dependent interpoint distance tests is very effective. Applications to genomics and metabolomics are discussed.

Keywords: biomarker; case-control study; high-dimensional data; metabolomics; nonparametric tests.

MeSH terms

  • Adenocarcinoma / genetics
  • Adenocarcinoma / metabolism
  • Biomarkers / urine
  • Case-Control Studies
  • Colonic Neoplasms / genetics
  • Colonic Neoplasms / metabolism
  • Data Interpretation, Statistical*
  • Genomics / statistics & numerical data*
  • Humans
  • Infertility, Male / metabolism
  • Infertility, Male / urine
  • Male
  • Metabolomics / statistics & numerical data*
  • Proteomics / statistics & numerical data*
  • Sample Size

Substances

  • Biomarkers