Fast human-animal detection from highly cluttered camera-trap images using joint background modeling and deep learning classification | IEEE Conference Publication | IEEE Xplore

Fast human-animal detection from highly cluttered camera-trap images using joint background modeling and deep learning classification


Abstract:

In this paper, we couple effective dynamic background modeling with deep learning classification to develop a fast and accurate scheme for human-animal detection from hig...Show More

Abstract:

In this paper, we couple effective dynamic background modeling with deep learning classification to develop a fast and accurate scheme for human-animal detection from highly cluttered camera-trap images using joint background modeling and deep learning classification. Specifically, first, we develop an effective background modeling and subtraction scheme to generate region proposals for the foreground objects. We then develop a cross-frame image patch verification to reduce the number of foreground object proposals. Finally, we perform complexity-accuracy analysis of deep convolutional neural networks (DCNN) to develop a fast deep learning classification scheme to classify these region proposals into three categories: human, animals, and background patches. The optimized DCNN is able to maintain high level of accuracy while reducing the computational complexity by 14 times. Our experimental results demonstrate that the proposed method outperforms existing methods on the camera-trap dataset.
Date of Conference: 28-31 May 2017
Date Added to IEEE Xplore: 28 September 2017
ISBN Information:
Electronic ISSN: 2379-447X
Conference Location: Baltimore, MD, USA

I. Introduction

Camera-traps are stationary camera-sensor systems attached to trees in the field. Triggered by animal motion, they capture short image sequences of the animal appearance and activities along with other sensor data, such as light level, moisture, temperature, and GPS sensor data. Operating in a non-invasive manner, they record animal appearance without disturbance [1], [2]. During the past several years, a vast amount of camera-trap images have been collected, far exceeding the capability of manual image processing and annotation by human. There is an urgent need to develop automated animal detection, segmentation, tracking, and biometric feature extraction tools for automated processing of these massive camera-trap datasets. Fig. 1 shows some examples of large background motion and low contrast between animal and background. We can see that images captured in natural environments represent a large class of challenging scenes that have not been sufficiently addressed in the literature. These types of scenes are often highly cluttered and dynamic with swaying trees, rippling water, moving shadows, sun spots, rain, etc. It is getting more complicated when natural animal camouflage adds extra complexity to the analysis of these scenes.

Contact IEEE to Subscribe

References

References is not available for this document.