In this paper we propose a method for automatic detection of salient objects in video streams. The movie is firstly segmented into shots based on a scale space filtering graph partition method. Next, we introduced a combined spatial and temporal video attention model. The proposed approach combines a region-based contrast saliency measure with a novel temporal attention model. The camera/background motion is determined using a set of homographic transforms, estimated by recursively applying the RANSAC algorithm on the SIFT interest point correspondence, while other types of movements are identified using agglomerative clustering and temporal region consistency. A decision is taken based on the combined spatial and temporal attention models. Finally, we demonstrate how the extracted saliency map can be used to create segmentation masks. The experimental results validate the proposed framework and demonstrate that our approach is effective for various types of videos, including noisy and low resolution data.