Rapid developments in the Internet and multimedia applications allow us to access large amounts of image and video data. While significant progress has been made in digital data compression, content-based functionalities are still quite limited. Many existing techniques in content-based retrieval are based on global visual features extracted from the entire image. In order to provide more efficient content-based functionalities for video applications, it is necessary to extract meaningful video objects from scenes to enable object-based representation of video content. Object-based representation is also introduced by MPEG-4 to enable content-based functionality and high coding efficiency. In this paper, we propose a new algorithm that automatically extracts meaningful video objects from video sequences. The algorithm begins with the robust motion segmentation on the first two successive frames. To detect moving objects, segmented regions are grouped together according to their spatial similarity. A binary object model for each moving object is automatically derived and tracked in subsequent frames using the generalized Hausdorff distance. The object model is updated for each frame to accommodate for complex motions and shape changes of the object. Experimental results using different types of video sequences are presented to demonstrate the efficiency and accuracy of our proposed algorithm.