A new fine-grained method for automated visual analysis of herbarium specimens: A case study for phenological data extraction

Appl Plant Sci. 2020 Jul 1;8(6):e11368. doi: 10.1002/aps3.11368. eCollection 2020 Jun.

Abstract

Premise: Herbarium specimens represent an outstanding source of material with which to study plant phenological changes in response to climate change. The fine-scale phenological annotation of such specimens is nevertheless highly time consuming and requires substantial human investment and expertise, which are difficult to rapidly mobilize.

Methods: We trained and evaluated new deep learning models to automate the detection, segmentation, and classification of four reproductive structures of Streptanthus tortuosus (flower buds, flowers, immature fruits, and mature fruits). We used a training data set of 21 digitized herbarium sheets for which the position and outlines of 1036 reproductive structures were annotated manually. We adjusted the hyperparameters of a mask R-CNN (regional convolutional neural network) to this specific task and evaluated the resulting trained models for their ability to count reproductive structures and estimate their size.

Results: The main outcome of our study is that the performance of detection and segmentation can vary significantly with: (i) the type of annotations used for training, (ii) the type of reproductive structures, and (iii) the size of the reproductive structures. In the case of Streptanthus tortuosus, the method can provide quite accurate estimates (77.9% of cases) of the number of reproductive structures, which is better estimated for flowers than for immature fruits and buds. The size estimation results are also encouraging, showing a difference of only a few millimeters between the predicted and actual sizes of buds and flowers.

Discussion: This method has great potential for automating the analysis of reproductive structures in high-resolution images of herbarium sheets. Deeper investigations regarding the taxonomic scalability of this approach and its potential improvement will be conducted in future work.

Keywords: automated regional segmentation; deep learning; herbarium data; natural history collections; phenological stage annotation; phenophase; regional convolutional neural network; visual data classification.