Characteristic features of statistical models and machine learning methods derived from pest and disease monitoring datasets

Shigeki Kishi; Jianqiang Sun; Akira Kawaguchi; Sunao Ochi; Megumi Yoshida; Takehiko Yamanaka

doi:10.1098/rsos.230079

Characteristic features of statistical models and machine learning methods derived from pest and disease monitoring datasets

R Soc Open Sci. 2023 Jun 28;10(6):230079. doi: 10.1098/rsos.230079. eCollection 2023 Jun.

Authors

Shigeki Kishi¹, Jianqiang Sun¹, Akira Kawaguchi^{1

2}, Sunao Ochi^{1

3}, Megumi Yoshida^{1

3}, Takehiko Yamanaka¹

Affiliations

¹ Research Center for Agricultural Information and Technology, National Agriculture and Food Research Organization 105-0003, 2-14-1 Kowa Nishi-Shimbashi Building, Nishi-Shimbashi, Minato, Tokyo, Japan.
² Western Region Agricultural Research Center (Kinki, Chugoku and Shikoku Regions), National Agriculture and Food Research Organization 721-0975, 6-12-1 Nishi-Fukatsu, Fukuyama, Hiroshima, Japan.
³ Institute for Plant Protection, National Agriculture and Food Research Organization 305-8666, 2-1-18 Kannon-dai, Tsukuba, Ibaraki, Japan.

Abstract

While many studies have used traditional statistical methods when analysing monitoring data to predict future population dynamics of crop pests and diseases, increasing studies have used machine learning methods. The characteristic features of these methods have not been fully elucidated and arranged. We compared the prediction performance between two statistical and seven machine learning methods using 203 monitoring datasets recorded over several decades on four major crops in Japan and meteorological and geographical information as the explanatory variables. The decision tree and random forest of machine learning were found to be most efficient, while regression models of statistical and machine learning methods were relatively inferior. The best two methods were better for biased and scarce data, while the statistical Bayesian model was better for larger dataset sizes. Therefore, researchers should consider data characteristics when selecting the most appropriate method.

Keywords: crop disease; crop pest; machine learning; statistical model.

Associated data

figshare/10.6084/m9.figshare.c.6699912