Animal Scanner: Software for classifying humans, animals, and empty frames in camera trap images

Hayder Yousif; Jianhe Yuan; Roland Kays; Zhihai He

doi:10.1002/ece3.4747

Animal Scanner: Software for classifying humans, animals, and empty frames in camera trap images

Ecol Evol. 2019 Feb 10;9(4):1578-1589. doi: 10.1002/ece3.4747. eCollection 2019 Feb.

Authors

Hayder Yousif¹, Jianhe Yuan¹, Roland Kays^{2

3}, Zhihai He¹

Affiliations

¹ Department of Electrical and Computer Engineering University of Missouri-Columbia Columbia Missouri.
² Department of Forestry and Environmental Resources North Carolina State University Raleigh North Carolina.
³ North Carolina Museum of Natural Sciences Raleigh North Carolina.

Abstract

Camera traps are a popular tool to sample animal populations because they are noninvasive, detect a variety of species, and can record many thousands of animal detections per deployment. Cameras are typically set to take bursts of multiple photographs for each detection and are deployed in arrays of dozens or hundreds of sites, often resulting in millions of photographs per study. The task of converting photographs to animal detection records from such large image collections is daunting, and made worse by situations that generate copious empty pictures from false triggers (e.g., camera malfunction or moving vegetation) or pictures of humans. We developed computer vision algorithms to detect and classify moving objects to aid the first step of camera trap image filtering-separating the animal detections from the empty frames and pictures of humans. Our new work couples foreground object segmentation through background subtraction with deep learning classification to provide a fast and accurate scheme for human-animal detection. We provide these programs as both Matlab GUI and command prompt developed with C++. The software reads folders of camera trap images and outputs images annotated with bounding boxes around moving objects and a text file summary of results. This software maintains high accuracy while reducing the execution time by 14 times. It takes about 6 seconds to process a sequence of ten frames (on a 2.6 GHZ CPU computer). For those cameras with excessive empty frames due to camera malfunction or blowing vegetation automatically removes 54% of the false-triggers sequences without influencing the human/animal sequences. We achieve 99.58% on image-level empty versus object classification of Serengeti dataset. We offer the first computer vision tool for processing camera trap images providing substantial time savings for processing large image datasets, thus improving our ability to monitor wildlife across large scales with camera traps.

Keywords: background subtraction; camera trap images; deep convolutional neural networks; human–animal detection; wildlife monitoring.