Machine Learning Strategies for the Retrieval of Leaf-Chlorophyll Dynamics: Model Choice, Sequential Versus Retraining Learning, and Hyperspectral Predictors

Yoseline Angel; Matthew F McCabe

doi:10.3389/fpls.2022.722442

Machine Learning Strategies for the Retrieval of Leaf-Chlorophyll Dynamics: Model Choice, Sequential Versus Retraining Learning, and Hyperspectral Predictors

Front Plant Sci. 2022 Mar 11:13:722442. doi: 10.3389/fpls.2022.722442. eCollection 2022.

Authors

Yoseline Angel¹, Matthew F McCabe¹

Affiliation

¹ Hydrology, Agriculture and Land Observation Group, Water Desalination and Reuse Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.

Abstract

Monitoring leaf Chlorophyll (Chl) in-situ is labor-intensive, limiting representative sampling for detailed mapping of Chl variability at field scales across time. Unmanned aeria-l vehicles (UAV) and hyperspectral cameras provide flexible platforms for observing agricultural systems, overcoming this spatio-temporal sampling constraint. Here, we evaluate a customized machine learning (ML) workflow to retrieve multi-temporal leaf-Chl levels, combining sub-centimeter resolution UAV-hyperspectral imagery (400-1,000 nm) with leaf-level reflectance spectra and SPAD measurements, capturing temporal correlations, selecting relevant predictors, and retrieving accurate results under different conditions. The study is performed within a phenotyping experiment to monitor wild tomato plants' development. Several analyses were conducted to evaluate multiple ML strategies, including: (1) exploring sequential versus retraining learning; (2) comparing insights gained from using 272 spectral bands versus 60 pigment-based vegetation indices (VIs); and (3) assessing six regression methods (linear, partial-least-square regression; PLSR, decision trees, support vector, ensemble trees, and Gaussian process; GPR). Goodness-of-fit (R ²) and accuracy metrics (MAE, RMSE) were determined using training/testing and validation data subsets to assess the models' performance. Overall, while equally good performance was obtained using either PLSR, GPR, or random forest, results show: (1) the retraining strategy improved the ability of most of the approaches to model SPAD-based Chl dynamics; (2) comparative analysis between retrievals and validation data distributions informed the models' ability to capture Chl dynamics through SPAD levels; (3) VI predictors slightly improved R ² (e.g., from 0.59 to 0.74 units for GPR) and accuracy (e.g., MAE and RMSE differences of up to 2 SPAD units) in specific algorithms; (4) feature importance examined through these methods, revealed strong overlaps between relevant bands and VI predictors, highlighting a few decisive spectral ranges and indices useful for retrieving leaf-Chl levels. The proposed ML framework allows the retrieval of high-quality spatially distributed and multi-temporal SPAD-based chlorophyll maps at an ultra-high pixel resolution (e.g., 7 mm).

Keywords: SPAD – leaf greenness; UAV; chlorophyll; digital phenotyping; hyperspectral image; machine learning; multitemporal analyses; vegetation indices.