A spatiotemporal ensemble machine learning framework for generating land use/land cover time-series maps for Europe (2000-2019) based on LUCAS, CORINE and GLAD Landsat

PeerJ. 2022 Jul 21:10:e13573. doi: 10.7717/peerj.13573. eCollection 2022.

Abstract

A spatiotemporal machine learning framework for automated prediction and analysis of long-term Land Use/Land Cover dynamics is presented. The framework includes: (1) harmonization and preprocessing of spatial and spatiotemporal input datasets (GLAD Landsat, NPP/VIIRS) including five million harmonized LUCAS and CORINE Land Cover-derived training samples, (2) model building based on spatial k-fold cross-validation and hyper-parameter optimization, (3) prediction of the most probable class, class probabilities and model variance of predicted probabilities per pixel, (4) LULC change analysis on time-series of produced maps. The spatiotemporal ensemble model consists of a random forest, gradient boosted tree classifier, and an artificial neural network, with a logistic regressor as meta-learner. The results show that the most important variables for mapping LULC in Europe are: seasonal aggregates of Landsat green and near-infrared bands, multiple Landsat-derived spectral indices, long-term surface water probability, and elevation. Spatial cross-validation of the model indicates consistent performance across multiple years with overall accuracy (a weighted F1-score) of 0.49, 0.63, and 0.83 when predicting 43 (level-3), 14 (level-2), and five classes (level-1). Additional experiments show that spatiotemporal models generalize better to unknown years, outperforming single-year models on known-year classification by 2.7% and unknown-year classification by 3.5%. Results of the accuracy assessment using 48,365 independent test samples shows 87% match with the validation points. Results of time-series analysis (time-series of LULC probabilities and NDVI images) suggest forest loss in large parts of Sweden, the Alps, and Scotland. Positive and negative trends in NDVI in general match the land degradation and land restoration classes, with "urbanization" showing the most negative NDVI trend. An advantage of using spatiotemporal ML is that the fitted model can be used to predict LULC in years that were not included in its training dataset, allowing generalization to past and future periods, e.g. to predict LULC for years prior to 2000 and beyond 2020. The generated LULC time-series data stack (ODSE-LULC), including the training points, is publicly available via the ODSE Viewer. Functions used to prepare data and run modeling are available via the eumap library for Python.

Keywords: Big data; Ensemble; Environmental monitoring; Land use/land cover; Landsat; Machine learning; Probability; Spatial analysis; Spatiotemporal; Uncertainty.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Environmental Monitoring*
  • Europe
  • Probability
  • Time Factors
  • Urbanization*

Grants and funding

This work was supported by CEF Telecom project 2018-EU-IA-0095 and is co-financed by the Innovation and Networks Executive Agency (INEA). This research was also funded by the CERES project, by the Science Fund of the Republic of Serbia –Program for Development of Projects in the Field of Artificial Intelligence and by the AgriCapture Horizon 2020 Research and Innovation programme under Grant agreement No. 101004282. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.