Endurance test selection optimized via sample size predictions

Roy M Salgado; Aaron R Caldwell; Kirsten E Coffman; Samuel N Cheuvront; Robert W Kenefick

doi:10.1152/japplphysiol.00408.2020

Endurance test selection optimized via sample size predictions

J Appl Physiol (1985). 2020 Sep 1;129(3):467-473. doi: 10.1152/japplphysiol.00408.2020. Epub 2020 Jul 30.

Authors

Roy M Salgado¹, Aaron R Caldwell¹, Kirsten E Coffman¹, Samuel N Cheuvront², Robert W Kenefick¹

Affiliations

¹ Thermal and Mountain Medicine Division, US Army Research Institute of Environmental Medicine, Natick, Massachusetts.
² Biophysics and Biomedical Modeling Division, US Army Research Institute of Environmental Medicine, Natick, Massachusetts.

Abstract

Selecting the most appropriate performance test is critical in detecting the effect of an intervention. In this investigation we 1) used time-trial (TT) performance data to estimate sample size requirements for test selection and 2) demonstrated the differences in statistical power between a repeated-measures ANOVA (RM-ANOVA) and analysis of covariance (ANCOVA) for detecting an effect in parallel group design. A retrospective analysis of six altitude studies was completed, totaling 105 volunteers. We quantified the test-retest reliability [i.e., intraclass correlation coefficient (ICC) and standard error of measurement (SEM)] and then calculated the standardized effect size for a 5-20% change in TT performance. With these outcomes, a power analysis was performed and required sample sizes were compared among performance tests. Relative to TT duration, the 11.2-km run had the lowest between-subject variance, and thus greatest statistical power (i.e., required smallest sample size) to detect a given percent change in performance. However, the 3.2-km run was the most reliable test (ICC: 0.89, SEM: 81 s) and thus better suited to detect the smallest absolute (i.e., seconds) change in performance. When TT durations were similar, a running modality (11.2-km run; ICC: 0.83, SEM: 422 s) was far more reliable than cycling (720-kJ cycle; ICC: 0.77, SEM: 480 s). In all scenarios, the ANCOVA provided greater statistical power than the RM-ANOVA. Our results suggest that running tests (3.2 km and 11.2 km) using ANCOVA analysis provide the greatest likelihood of detecting a significant change in performance response to an intervention, particularly in populations unaccustomed to cycling.NEW & NOTEWORTHY This is the first investigation to utilize time-trial (TT) data from previous studies in simulations to estimate statistical power. We developed an easy-to-use decision aid detailing the required sample size needed to detect a given change in TT performance for the purpose of test selection. Furthermore, our detailed methods can be applied to any scenario in which there is an impact of a stressor and the desire to detect a treatment effect.

Keywords: decision aid; exercise performance; hypoxia; test-retest reliability.

MeSH terms

Exercise Test
Humans
Reproducibility of Results
Research Design
Retrospective Studies
Running*
Sample Size