A unified strategy to rebalance multifactorial designs with unequal group sizes: application to analysis of variance multiblock orthogonal partial least squares

Anal Chim Acta. 2023 Jul 4:1263:341284. doi: 10.1016/j.aca.2023.341284. Epub 2023 Apr 25.

Abstract

Background: Adequately handling unbalanced groups remains one of the major challenges for the analysis of multivariate data collected from multifactorial experimental designs. While partial least squares-based methods, such as analysis of variance multiblock orthogonal partial least squares (AMOPLS), can offer better discrimination between factor levels, they can be more heavily affected by this issue, and unbalanced designs of experiments may lead to a substantial confusion of the effects. Even state-of-the-art analysis of variance (ANOVA) decomposition methodologies using general linear models (GLM) lack the ability to efficiently disentangle these sources of variation when combined with AMOPLS.

Results: A versatile solution developed as an extension of a prior rebalancing strategy is proposed for the first decomposition step based on ANOVA. This approach has the advantage of yielding an unbiased estimation of the parameters and retaining the within-group variation in the rebalanced design, while preserving the orthogonality of effect matrices, even in presence of unequal group sizes. This property is of utmost importance for model interpretation because it avoids mixing sources of variation related to the different effects in the design. A real case study involving metabolomic data from in vitro toxicological experiments was used to demonstrate the potential of this strategy to handle unequal group sizes using a supervised approach. Primary 3D rat neural cell cultures were exposed to trimethyltin following a multifactorial design of experiments involving three fixed effect factors.

Significance and novelty: The rebalancing strategy was demonstrated as a novel and potent solution to handle unbalanced experimental designs by offering unbiased parameter estimators and orthogonal submatrices, thus avoiding confusion of the effects and facilitating model interpretation. Moreover, it can be combined with any multivariate method used for the analysis of high-dimensional data collected from multifactorial designs.

Keywords: AMOPLS; ANOVA; High-dimensional; Rebalancing; Supervised; Unbalanced experimental designs.

MeSH terms

  • Analysis of Variance
  • Animals
  • Least-Squares Analysis
  • Linear Models
  • Metabolomics*
  • Rats
  • Research Design*
  • Sulfadiazine

Substances

  • Sulfadiazine