Investigating Explanatory Factors of Machine Learning Models for Plant Classification

Wilfried Wöber; Lars Mehnen; Peter Sykacek; Harald Meimberg

doi:10.3390/plants10122674

Investigating Explanatory Factors of Machine Learning Models for Plant Classification

Plants (Basel). 2021 Dec 5;10(12):2674. doi: 10.3390/plants10122674.

Authors

Wilfried Wöber^{1

2}, Lars Mehnen³, Peter Sykacek⁴, Harald Meimberg¹

Affiliations

¹ Department of Integrative Biology and Biodiversity Research, Institute of Integrative Conservation Research, University of Natural Resources and Life Sciences, Gregor Mendel Str. 33, 1080 Vienna, Austria.
² Department Industrial Engineering, University of Applied Sciences Technikum Wien, Höchstädtplatz 6, 1200 Vienna, Austria.
³ Department Computer Science, University of Applied Sciences Technikum Wien, Höchstädtplatz 6, 1200 Vienna, Austria.
⁴ Department of Biotechnology, Institute of Computational Biology, University of Natural Resources and Life Sciences, Muthgasse 18, 1190 Vienna, Austria.

Abstract

Recent progress in machine learning and deep learning has enabled the implementation of plant and crop detection using systematic inspection of the leaf shapes and other morphological characters for identification systems for precision farming. However, the models used for this approach tend to become black-box models, in the sense that it is difficult to trace characters that are the base for the classification. The interpretability is therefore limited and the explanatory factors may not be based on reasonable visible characters. We investigate the explanatory factors of recent machine learning and deep learning models for plant classification tasks. Based on a Daucus carota and a Beta vulgaris image data set, we implement plant classification models and compare those models by their predictive performance as well as explainability. For comparison we implemented a feed forward convolutional neuronal network as a default model. To evaluate the performance, we trained an unsupervised Bayesian Gaussian process latent variable model as well as a convolutional autoencoder for feature extraction and rely on a support vector machine for classification. The explanatory factors of all models were extracted and analyzed. The experiments show, that feed forward convolutional neuronal networks (98.24% and 96.10% mean accuracy) outperforms the Bayesian Gaussian process latent variable pipeline (92.08% and 94.31% mean accuracy) as well as the convolutional autoenceoder pipeline (92.38% and 93.28% mean accuracy) based approaches in terms of classification accuracy, even though not significant for Beta vulgaris images. Additionally, we found that the neuronal network used biological uninterpretable image regions for the plant classification task. In contrast to that, the unsupervised learning models rely on explainable visual characters. We conclude that supervised convolutional neuronal networks must be used carefully to ensure biological interpretability. We recommend unsupervised machine learning, careful feature investigation, and statistical feature analysis for biological applications.

Keywords: deep learning; explainable AI; machine learning; plant leaf morphometrics.