To have value, comparisons of high-throughput phenotyping methods need statistical tests of bias and variance

Justin M McGrath; Matthew H Siebers; Peng Fu; Stephen P Long; Carl J Bernacchi

doi:10.3389/fpls.2023.1325221

To have value, comparisons of high-throughput phenotyping methods need statistical tests of bias and variance

Front Plant Sci. 2024 Jan 19:14:1325221. doi: 10.3389/fpls.2023.1325221. eCollection 2023.

Authors

Justin M McGrath^{1

2}, Matthew H Siebers^{1

2}, Peng Fu^{3

4}, Stephen P Long^{2

4

5}, Carl J Bernacchi^{1

2

4}

Affiliations

¹ Global Change and Photosynthesis Research Unit, USDA-Agricultural Research Service (ARS), Urbana, IL, United States.
² Department of Plant Biology, University of Illinois, Urbana-Champaign, Urbana, IL, United States.
³ Center for Advanced Agriculture and Sustainability, Harrisburg University of Science and Technology, Harrisburg, PA, United States.
⁴ Carl R. Woese Institute for Genomic Biology, University of Illinois, Urbana-Champaign, Urbana, IL, United States.
⁵ Department of Crop Sciences, University of Illinois, Urbana-Champaign, Urbana, IL, United States.

Abstract

The gap between genomics and phenomics is narrowing. The rate at which it is narrowing, however, is being slowed by improper statistical comparison of methods. Quantification using Pearson's correlation coefficient (r) is commonly used to assess method quality, but it is an often misleading statistic for this purpose as it is unable to provide information about the relative quality of two methods. Using r can both erroneously discount methods that are inherently more precise and validate methods that are less accurate. These errors occur because of logical flaws inherent in the use of r when comparing methods, not as a problem of limited sample size or the unavoidable possibility of a type I error. A popular alternative to using r is to measure the limits of agreement (LOA). However both r and LOA fail to identify which instrument is more or less variable than the other and can lead to incorrect conclusions about method quality. An alternative approach, comparing variances of methods, requires repeated measurements of the same subject, but avoids incorrect conclusions. Variance comparison is arguably the most important component of method validation and, thus, when repeated measurements are possible, variance comparison provides considerable value to these studies. Statistical tests to compare variances presented here are well established, easy to interpret and ubiquitously available. The widespread use of r has potentially led to numerous incorrect conclusions about method quality, hampering development, and the approach described here would be useful to advance high throughput phenotyping methods but can also extend into any branch of science. The adoption of the statistical techniques outlined in this paper will help speed the adoption of new high throughput phenotyping techniques by indicating when one should reject a new method, outright replace an old method or conditionally use a new method.

Keywords: Bland and Altman; bias; limits of agreement; physical sciences; statistics method comparison; variance.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was funded in part by the Bill and Melinda Gates Foundation grant titled “RIPE—Realizing increased photosynthetic efficiency for sustainable increases in crop yield” (OPP1060461), the Advanced Research Projects Agency of U.S. Department of Energy (DE-AR0000598), and the Agricultural Research Service of the United States Department of Agriculture.