Approaches to Sample Size Determination for Multivariate Data: Applications to PCA and PLS-DA of Omics Data

J Proteome Res. 2016 Aug 5;15(8):2379-93. doi: 10.1021/acs.jproteome.5b01029. Epub 2016 Jul 7.

Abstract

Sample size determination is a fundamental step in the design of experiments. Methods for sample size determination are abundant for univariate analysis methods, but scarce in the multivariate case. Omics data are multivariate in nature and are commonly investigated using multivariate statistical methods, such as principal component analysis (PCA) and partial least-squares discriminant analysis (PLS-DA). No simple approaches to sample size determination exist for PCA and PLS-DA. In this paper we will introduce important concepts and offer strategies for (minimally) required sample size estimation when planning experiments to be analyzed using PCA and/or PLS-DA.

Keywords: covariance estimation; dimensionality; eigenvalue distribution; hypothesis testing; loading estimation; multivariate analysis; power analysis; random matrix theory.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Discriminant Analysis
  • Humans
  • Least-Squares Analysis
  • Metabolomics / statistics & numerical data*
  • Multivariate Analysis*
  • Principal Component Analysis
  • Sample Size*
  • Serum / chemistry
  • Serum / metabolism
  • Swine
  • Urine / chemistry