A novel visual object tracking scheme is proposed by using joint point feature correspondences and object appearance similarity. For point feature-based tracking, we propose a candidate tracker that simultaneously exploits two separate sets of point feature correspondences in the foreground and in the surrounding background, where background features are exploited for the indication of occlusions. Feature points in these two sets are then dynamically maintained. For object appearance-based tracking, we propose a candidate tracker based on an enhanced anisotropic mean shift with a fully tunable (five degrees of freedom) bounding box that is partially guided by the above feature point tracker. Both candidate trackers contain a reinitialization process to reset the tracker in order to prevent accumulated tracking error propagation in frames. In addition, a novel online learning method is introduced to the enhanced mean shift-based candidate tracker. The reference object distribution is updated in each time interval if there is an indication of stable and reliable tracking without background interferences. By dynamically updating the reference object model, tracking is further improved by using a more accurate object appearance similarity measure. An optimal selection criterion is applied to the final tracker based on the results of these candidate trackers. Experiments have been conducted on several videos containing a range of complex scenarios. To evaluate the performance, the proposed scheme is further evaluated using three objective criteria, and compared with two existing trackers. All our results have shown that the proposed scheme is very robust and has yielded a marked improvement in terms of tracking drift, tightness, and accuracy of tracked bounding boxes, especially for complex video scenarios containing long-term partial occlusions or intersections, deformation, or background clutter with similar color distributions to the foreground object.