Characteristic features of statistical models and machine learning methods derived from pest and disease monitoring datasets

R Soc Open Sci. 2023 Jun 28;10(6):230079. doi: 10.1098/rsos.230079. eCollection 2023 Jun.

Abstract

While many studies have used traditional statistical methods when analysing monitoring data to predict future population dynamics of crop pests and diseases, increasing studies have used machine learning methods. The characteristic features of these methods have not been fully elucidated and arranged. We compared the prediction performance between two statistical and seven machine learning methods using 203 monitoring datasets recorded over several decades on four major crops in Japan and meteorological and geographical information as the explanatory variables. The decision tree and random forest of machine learning were found to be most efficient, while regression models of statistical and machine learning methods were relatively inferior. The best two methods were better for biased and scarce data, while the statistical Bayesian model was better for larger dataset sizes. Therefore, researchers should consider data characteristics when selecting the most appropriate method.

Keywords: crop disease; crop pest; machine learning; statistical model.

Associated data

  • figshare/10.6084/m9.figshare.c.6699912