Do you get what you see? Insights of using mAP to select architectures of pretrained neural networks for automated aerial animal detection

Mael Moreni; Jerome Theau; Samuel Foucher

doi:10.1371/journal.pone.0284449

Do you get what you see? Insights of using mAP to select architectures of pretrained neural networks for automated aerial animal detection

PLoS One. 2023 Apr 24;18(4):e0284449. doi: 10.1371/journal.pone.0284449. eCollection 2023.

Authors

Mael Moreni^{1

2}, Jerome Theau^{1

2}, Samuel Foucher¹

Affiliations

¹ Department of Applied Geomatics, Université de Sherbrooke, Sherbrooke, Quebec, Canada.
² Quebec Centre for Biodiversity Science (QCBS), Montreal, Quebec, Canada.

Abstract

The vast amount of images generated by aerial imagery in the context of regular wildlife surveys nowadays require automatic processing tools. At the top of the mountain of different methods to automatically detect objects in images reigns deep learning's object detection. The recent focus given to this task has led to an influx of many different architectures of neural networks that are benchmarked against standard datasets like Microsoft's Common Objects in COntext (COCO). Performance on COCO, a large dataset of computer vision images, is given in terms of mean Average Precision (mAP). In this study, we use six pretrained networks to detect red deer from aerial images, three of which have never been used, to our knowledge, in a context of aerial wildlife surveys. We compare their performance along COCO's mAP and a common test metric in animal surveys, the F1-score. We also evaluate how dataset imbalance and background uniformity, two common difficulties in wildlife surveys, impact the performance of our models. Our results show that the mAP is not a reliable metric to select the best model to count animals in aerial images and that a counting-focused metric like the F1-score should be favored instead. Our best overall performance was achieved with Generalized Focal Loss (GFL). It scored the highest along both metrics, combining most accurate counting and localization (with average F1-score of 0.96 and 0.97 and average mAP scores of 0.77 and 0.89 on both datasets respectively) and is therefore very promising for future applications. While both imbalance and background uniformity improved the performance of our models, their combined effect had twice as much impact as the choice of architecture. This finding seems to confirm that the recent data-centric shift in the deep learning field could also lead to performance gains in wildlife surveys.

Copyright: © 2023 Moreni et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Animals
Animals, Wild
Deer*
Neural Networks, Computer

Grants and funding

We are grateful for the support and funding provided by Mitacs (IT07901), the Quebec Center for Geomatics, and Microdrones that made this research possible. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.