We propose a framework for learning robust, adaptive appearance models to be used for motion-based tracking of natural objects. The approach involves a mixture of stable image structure, learned over long time courses, along with 2-frame motion information and an outlier process. An online EM-algorithm is used to adapt the appearance model parameters over time. An implementation of this approach is developed for an appearance model based on the filter responses from a steerable pyramid. This model is used in a motion-based tracking algorithm to provide robustness in the face of image outliers, such as those caused by occlusions. It also provides the ability to adapt to natural changes in appearance, such as those due to facial expressions or variations in 3D pose. We show experimental results on a variety of natural image sequences of people moving within cluttered environments.