Using Convolutional Neural Networks to Efficiently Extract Immense Phenological Data From Community Science Images

Rachel A Reeb; Naeem Aziz; Samuel M Lapp; Justin Kitzes; J Mason Heberling; Sara E Kuebbing

doi:10.3389/fpls.2021.787407

Using Convolutional Neural Networks to Efficiently Extract Immense Phenological Data From Community Science Images

Front Plant Sci. 2022 Jan 17:12:787407. doi: 10.3389/fpls.2021.787407. eCollection 2021.

Authors

Rachel A Reeb¹, Naeem Aziz¹, Samuel M Lapp¹, Justin Kitzes¹, J Mason Heberling^{1

2}, Sara E Kuebbing^{1

2}

Affiliations

¹ Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, United States.
² Section of Botany, Carnegie Museum of Natural History, Pittsburgh, PA, United States.

Abstract

Community science image libraries offer a massive, but largely untapped, source of observational data for phenological research. The iNaturalist platform offers a particularly rich archive, containing more than 49 million verifiable, georeferenced, open access images, encompassing seven continents and over 278,000 species. A critical limitation preventing scientists from taking full advantage of this rich data source is labor. Each image must be manually inspected and categorized by phenophase, which is both time-intensive and costly. Consequently, researchers may only be able to use a subset of the total number of images available in the database. While iNaturalist has the potential to yield enough data for high-resolution and spatially extensive studies, it requires more efficient tools for phenological data extraction. A promising solution is automation of the image annotation process using deep learning. Recent innovations in deep learning have made these open-source tools accessible to a general research audience. However, it is unknown whether deep learning tools can accurately and efficiently annotate phenophases in community science images. Here, we train a convolutional neural network (CNN) to annotate images of Alliaria petiolata into distinct phenophases from iNaturalist and compare the performance of the model with non-expert human annotators. We demonstrate that researchers can successfully employ deep learning techniques to extract phenological information from community science images. A CNN classified two-stage phenology (flowering and non-flowering) with 95.9% accuracy and classified four-stage phenology (vegetative, budding, flowering, and fruiting) with 86.4% accuracy. The overall accuracy of the CNN did not differ from humans (p = 0.383), although performance varied across phenophases. We found that a primary challenge of using deep learning for image annotation was not related to the model itself, but instead in the quality of the community science images. Up to 4% of A. petiolata images in iNaturalist were taken from an improper distance, were physically manipulated, or were digitally altered, which limited both human and machine annotators in accurately classifying phenology. Thus, we provide a list of photography guidelines that could be included in community science platforms to inform community scientists in the best practices for creating images that facilitate phenological analysis.

Keywords: Alliaria petiolata (garlic mustard); citizen science; convolutional neural network; deep learning; iNaturalist; phenology.

Associated data

Dryad/10.5061/dryad.mkkwh7123