Gradient boosting machine learning model to predict aflatoxins in Iowa corn

Front Microbiol. 2023 Sep 1:14:1248772. doi: 10.3389/fmicb.2023.1248772. eCollection 2023.

Abstract

Introduction: Aflatoxin (AFL), a secondary metabolite produced from filamentous fungi, contaminates corn, posing significant health and safety hazards for humans and livestock through toxigenic and carcinogenic effects. Corn is widely used as an essential commodity for food, feed, fuel, and export markets; therefore, AFL mitigation is necessary to ensure food and feed safety within the United States (US) and elsewhere in the world. In this case study, an Iowa-centric model was developed to predict AFL contamination using historical corn contamination, meteorological, satellite, and soil property data in the largest corn-producing state in the US.

Methods: We evaluated the performance of AFL prediction with gradient boosting machine (GBM) learning and feature engineering in Iowa corn for two AFL risk thresholds for high contamination events: 20-ppb and 5-ppb. A 90%-10% training-to-testing ratio was utilized in 2010, 2011, 2012, and 2021 (n = 630), with independent validation using the year 2020 (n = 376).

Results: The GBM model had an overall accuracy of 96.77% for AFL with a balanced accuracy of 50.00% for a 20-ppb risk threshold, whereas GBM had an overall accuracy of 90.32% with a balanced accuracy of 64.88% for a 5-ppb threshold. The GBM model had a low power to detect high AFL contamination events, resulting in a low sensitivity rate. Analyses for AFL showed satellite-acquired vegetative index during August significantly improved the prediction of corn contamination at the end of the growing season for both risk thresholds. Prediction of high AFL contamination levels was linked to aflatoxin risk indices (ARI) in May. However, ARI in July was an influential factor for the 5-ppb threshold but not for the 20-ppb threshold. Similarly, latitude was an influential factor for the 20-ppb threshold but not the 5-ppb threshold. Furthermore, soil-saturated hydraulic conductivity (Ksat) influenced both risk thresholds.

Discussion: Developing these AFL prediction models is practical and implementable in commodity grain handling environments to achieve the goal of preventative rather than reactive mitigations. Finding predictors that influence AFL risk annually is an important cost-effective risk tool and, therefore, is a high priority to ensure hazard management and optimal grain utilization to maximize the utility of the nation's corn crop.

Keywords: Iowa; aflatoxin; corn; feed safety; gradient boosting; prediction modeling.