Evaluating the Quality of Ecoinformatics Data Derived From Commercial Agriculture: A Repeatability Analysis of Pest Density Estimates

J Econ Entomol. 2021 Aug 5;114(4):1842-1846. doi: 10.1093/jee/toab127.

Abstract

Each year, consultants and field scouts working in commercial agriculture undertake a massive, decentralized data collection effort as they monitor insect populations to make real-time pest management decisions. These data, if integrated into a database, offer rich opportunities for applying big data or ecoinformatics methods in agricultural entomology research. However, questions have been raised about whether or not the underlying quality of these data is sufficiently high to be a foundation for robust research. Here I suggest that repeatability analysis can be used to quantify the quality of data collected from commercial field scouting, without requiring any additional data gathering by researchers. In this context, repeatability quantifies the proportion of total variance across all insect density estimates that is explained by differences across populations and is thus a measure of the underlying reliability of observations. Repeatability was moderately high for cotton fields scouted commercially for total Lygus hesperus Knight densities (R = 0.631) and further improved by accounting for observer effects (R = 0.697). Repeatabilities appeared to be somewhat lower than those computed for a comparable, but much smaller, researcher-generated data set. In general, the much larger sizes of ecoinformatics data sets are likely to more than compensate for modest reductions in measurement precision. Tools for evaluating data quality are important for building confidence in the growing applications of ecoinformatics methods.

Keywords: Lygus hesperus; data quality; data veracity; observer effect; repeatability analysis.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Agriculture*
  • Animals
  • Heteroptera*
  • Insecta
  • Pest Control
  • Reproducibility of Results