The CCB-ID approach to tree species mapping with airborne imaging spectroscopy

PeerJ. 2018 Oct 8:6:e5666. doi: 10.7717/peerj.5666. eCollection 2018.

Abstract

Background: Biogeographers assess how species distributions and abundances affect the structure, function, and composition of ecosystems. Yet we face a major challenge: it is difficult to precisely map species across landscapes. Novel Earth observations could overcome this challenge for vegetation mapping. Airborne imaging spectrometers measure plant functional traits at high resolution, and these measurements can be used to identify tree species. In this paper, I describe a trait-based approach to species identification with imaging spectroscopy, the Center for Conservation Biology species identification (CCB-ID) method, which was developed as part of an ecological data science evaluation competition.

Methods: These methods were developed using airborne imaging spectroscopy data from the National Ecological Observatory Network (NEON). CCB-ID classified tree species using trait-based reflectance variation and decision tree-based machine learning models, approximating a morphological trait and dichotomous key method inspired by botanical classification. First, outliers were removed using a spectral variance threshold. The remaining samples were transformed using principal components analysis (PCA) and resampled to reduce common species biases. Gradient boosting and random forest classifiers were trained using the transformed and resampled feature data. Prediction probabilities were calibrated using sigmoid regression, and sample-scale predictions were averaged to the crown scale.

Results: CCB-ID received a rank-1 accuracy score of 0.919, and a cross-entropy cost score of 0.447 on the competition test data. Accuracy and specificity scores were high for all species, but precision and recall scores varied for rare species. PCA transformation improved accuracy scores compared to models trained using reflectance data, but outlier removal and data resampling exacerbated class imbalance problems.

Discussion: CCB-ID accurately classified tree species using NEON data, reporting the best scores among participants. However, it failed to overcome several species mapping challenges like precisely identifying rare species. Key takeaways include (1) selecting models using metrics beyond accuracy (e.g., recall) could improve rare species predictions, (2) within-genus trait variation may drive spectral separability, precluding efforts to distinguish between functionally convergent species, (3) outlier removal and data resampling can exacerbate class imbalance problems, and should be carefully implemented, (4) PCA transformation greatly improved model results, and (5) targeted feature selection could further improve species classification models. CCB-ID is open source, designed for use with NEON data, and available to support species mapping efforts.

Keywords: Biogeography; Imaging spectroscopy; Modeling; Open source; Remote sensing; Species mapping.

Grants and funding

C. B. Anderson was supported by the Bing-Mooney Fellowship in Environmental Science and Conservation at Stanford University’s Department of Biology. The ECODSE competition was supported, in part, by a research grant from NIST IAD Data Science Research Program to D.Z. Wang, E.P. White, and S. Bohlman, by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through grant GBMF4563 to E.P. White, and by an NSF Dimension of Biodiversity program grant (DEB-1442280) to S. Bohlman. The National Ecological Observatory Network is a program sponsored by the National Science Foundation and operated under cooperative agreement by Battelle Memorial Institute. This material is based in part upon work supported by the National Science Foundation through the NEON Program. There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.