Forest-Scale Phenotyping: Productivity Characterisation Through Machine Learning

Front Plant Sci. 2020 Mar 6:11:99. doi: 10.3389/fpls.2020.00099. eCollection 2020.

Abstract

Advances in remote sensing combined with the emergence of sophisticated methods for large-scale data analytics from the field of data science provide new methods to model complex interactions in biological systems. Using a data-driven philosophy, insights from experts are used to corroborate the results generated through analytical models instead of leading the model design. Following such an approach, this study outlines the development and implementation of a whole-of-forest phenotyping system that incorporates spatial estimates of productivity across a large plantation forest. In large-scale plantation forestry, improving the productivity and consistency of future forests is an important but challenging goal due to the multiple interactions between biotic and abiotic factors, the long breeding cycle, and the high variability of growing conditions. Forest phenotypic expression is highly affected by the interaction of environmental conditions and forest management but the understanding of this complex dynamics is incomplete. In this study, we collected an extensive set of 2.7 million observations composed of 62 variables describing climate, forest management, tree genetics, and fine-scale terrain information extracted from environmental surfaces, management records, and remotely sensed data. Using three machine learning methods, we compared models of forest productivity and evaluate the gain and Shapley values for interpreting the influence of categorical variables on the power of these methods to predict forest productivity at a landscape level. The most accurate model identified that the most important drivers of productivity were, in order of importance, genetics, environmental conditions, leaf area index, topology, and soil properties, thus describing the complex interactions of the forest. This approach demonstrates that new methods in remote sensing and data science enable powerful, landscape-level understanding of forest productivity. The phenotyping method developed here can be used to identify superior and inferior genotypes and estimate a productivity index for individual site. This approach can improve tree breeding and deployment of the right genetics to the right site in order to increase the overall productivity across planted forests.

Keywords: GPU-acceleration; LIDAR. forestry; decision trees; gradient boosting; phenotyping.