Impacts of meteorological variables and machine learning algorithms on rice yield prediction in Korea

Subin Ha; Yong-Tak Kim; Eun-Soon Im; Jina Hur; Sera Jo; Yong-Seok Kim; Kyo-Moon Shim

doi:10.1007/s00484-023-02544-x

Impacts of meteorological variables and machine learning algorithms on rice yield prediction in Korea

Int J Biometeorol. 2023 Nov;67(11):1825-1838. doi: 10.1007/s00484-023-02544-x. Epub 2023 Sep 5.

Authors

Subin Ha^#¹, Yong-Tak Kim^#¹, Eun-Soon Im^{2

3}, Jina Hur⁴, Sera Jo⁴, Yong-Seok Kim⁴, Kyo-Moon Shim⁴

Affiliations

¹ Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong SAR, China.
² Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong SAR, China. ceim@ust.hk.
³ Division of Environment and Sustainability, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong SAR, China. ceim@ust.hk.
⁴ National Institute of Agricultural Sciences, Rural Development Administration, Wanju-gun, Jeollabuk-do, Korea.

^# Contributed equally.

PMID: 37667047
DOI: 10.1007/s00484-023-02544-x

Abstract

As crop productivity is greatly influenced by weather conditions, many attempts have been made to estimate crop yields using meteorological data and have achieved great progress with the development of machine learning. However, most yield prediction models are developed based on observational data, and the utilization of climate model output in yield prediction has been addressed in very few studies. In this study, we estimate rice yields in South Korea using the meteorological variables provided by ERA5 reanalysis data (ERA-O) and its dynamically downscaled data (ERA-DS). After ERA-O and ERA-DS are validated against observations (OBS), two different machine learning models, Support Vector Machine (SVM) and Long Short-Term Memory (LSTM), are trained with different combinations of eight meteorological variables (mean temperature, maximum temperature, minimum temperature, precipitation, diurnal temperature range, solar irradiance, mean wind speed, and relative humidity) obtained from OBS, ERA-O, and ERA-DS at weekly and monthly timescales from May to September. Regardless of the model type and the source of the input data, training a model with weekly datasets leads to better yield estimates compared to monthly datasets. LSTM generally outperforms SVM, especially when the model is trained with ERA-DS data at a weekly timescale. The best yield estimates are produced by the LSTM model trained with all eight variables at a weekly timescale. Altogether this study shows the significance of high spatial and temporal resolution of input meteorological data in yield prediction, which can also serve to substantiate the added value of dynamical downscaling.

Keywords: Dynamical downscaling; Machine learning model; Rice yield prediction.

Grants and funding

PJ014882/Rural Development Administration