I. Introduction
Camera-traps are stationary camera-sensor systems attached to trees in the field. Triggered by animal motion, they capture short image sequences of the animal appearance and activities along with other sensor data, such as light level, moisture, temperature, and GPS sensor data. Operating in a non-invasive manner, they record animal appearance without disturbance [1], [2]. During the past several years, a vast amount of camera-trap images have been collected, far exceeding the capability of manual image processing and annotation by human. There is an urgent need to develop automated animal detection, segmentation, tracking, and biometric feature extraction tools for automated processing of these massive camera-trap datasets. Fig. 1 shows some examples of large background motion and low contrast between animal and background. We can see that images captured in natural environments represent a large class of challenging scenes that have not been sufficiently addressed in the literature. These types of scenes are often highly cluttered and dynamic with swaying trees, rippling water, moving shadows, sun spots, rain, etc. It is getting more complicated when natural animal camouflage adds extra complexity to the analysis of these scenes.