A key problem in visual tracking is how to effectively combine spatio-temporal visual information from throughout a video to accurately estimate the state of an object. We address this problem by incorporating Dempster-Shafer (DS) information fusion into the tracking approach. To implement this fusion task, the entire image sequence is partitioned into spatially and temporally adjacent subsequences. A support vector machine (SVM) classifier is trained for object/nonobject classification on each of these subsequences, the outputs of which act as separate data sources. To combine the discriminative information from these classifiers, we further present a spatio-temporal weighted DS (STWDS) scheme. In addition, temporally adjacent sources are likely to share discriminative information on object/nonobject classification. To use such information, an adaptive SVM learning scheme is designed to transfer discriminative information across sources. Finally, the corresponding DS belief function of the STWDS scheme is embedded into a Bayesian tracking model. Experimental results on challenging videos demonstrate the effectiveness and robustness of the proposed tracking approach.