Endurance test selection optimized via sample size predictions

J Appl Physiol (1985). 2020 Sep 1;129(3):467-473. doi: 10.1152/japplphysiol.00408.2020. Epub 2020 Jul 30.

Abstract

Selecting the most appropriate performance test is critical in detecting the effect of an intervention. In this investigation we 1) used time-trial (TT) performance data to estimate sample size requirements for test selection and 2) demonstrated the differences in statistical power between a repeated-measures ANOVA (RM-ANOVA) and analysis of covariance (ANCOVA) for detecting an effect in parallel group design. A retrospective analysis of six altitude studies was completed, totaling 105 volunteers. We quantified the test-retest reliability [i.e., intraclass correlation coefficient (ICC) and standard error of measurement (SEM)] and then calculated the standardized effect size for a 5-20% change in TT performance. With these outcomes, a power analysis was performed and required sample sizes were compared among performance tests. Relative to TT duration, the 11.2-km run had the lowest between-subject variance, and thus greatest statistical power (i.e., required smallest sample size) to detect a given percent change in performance. However, the 3.2-km run was the most reliable test (ICC: 0.89, SEM: 81 s) and thus better suited to detect the smallest absolute (i.e., seconds) change in performance. When TT durations were similar, a running modality (11.2-km run; ICC: 0.83, SEM: 422 s) was far more reliable than cycling (720-kJ cycle; ICC: 0.77, SEM: 480 s). In all scenarios, the ANCOVA provided greater statistical power than the RM-ANOVA. Our results suggest that running tests (3.2 km and 11.2 km) using ANCOVA analysis provide the greatest likelihood of detecting a significant change in performance response to an intervention, particularly in populations unaccustomed to cycling.NEW & NOTEWORTHY This is the first investigation to utilize time-trial (TT) data from previous studies in simulations to estimate statistical power. We developed an easy-to-use decision aid detailing the required sample size needed to detect a given change in TT performance for the purpose of test selection. Furthermore, our detailed methods can be applied to any scenario in which there is an impact of a stressor and the desire to detect a treatment effect.

Keywords: decision aid; exercise performance; hypoxia; test-retest reliability.

MeSH terms

  • Exercise Test
  • Humans
  • Reproducibility of Results
  • Research Design
  • Retrospective Studies
  • Running*
  • Sample Size