A tale of two "forests": random forest machine learning AIDS tropical forest carbon mapping

Joseph Mascaro; Gregory P Asner; David E Knapp; Ty Kennedy-Bowdoin; Roberta E Martin; Christopher Anderson; Mark Higgins; K Dana Chadwick

doi:10.1371/journal.pone.0085993

A tale of two "forests": random forest machine learning AIDS tropical forest carbon mapping

PLoS One. 2014 Jan 28;9(1):e85993. doi: 10.1371/journal.pone.0085993. eCollection 2014.

Authors

Joseph Mascaro¹, Gregory P Asner¹, David E Knapp¹, Ty Kennedy-Bowdoin¹, Roberta E Martin¹, Christopher Anderson¹, Mark Higgins¹, K Dana Chadwick¹

Affiliation

¹ Department of Global Ecology, Carnegie Institution for Science, Stanford, California, United States of America.

Abstract

Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including--in the latter case--x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called "out-of-bag"), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha(-1) when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Carbon / analysis*
Climate Change*
Conservation of Natural Resources
Environmental Monitoring
Models, Theoretical*
Trees*

Substances

Carbon

Grants and funding

This study was supported by the John D. and Catherine T. MacArthur Foundation and the endowment of the Carnegie Institution for Science. The Carnegie Airborne Observatory is made possible by the Avatar Alliance Foundation, Gordon and Betty Moore Foundation, W. M. Keck Foundation, Margaret A. Cargill Foundation, Grantham Foundation for the Protection of the Environment, Mary Anne Nyburg Baker and G. Leonard Baker Jr., and William R. Hearst III. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.