This work investigates how to automatically parse object trajectories in surveillance videos, that aims to jointly solve three subproblems: i) spatial segmentation, ii) temporal tracking, and iii) object categorization. We present a novel representation spatio-temporal graph (ST-Graph), in which: i) graph nodes express the motion primitives, each representing a short sequence of small-size patches over consecutive images; and ii) every two neighbor nodes are linked with either a positive edge or a negative edge to describe their collaborative or exclusive relationship of belonging to the same object trajectory. Phrasing the trajectory parsing as a graph multi-coloring problem, we propose a unified probabilistic formulation to integrate various types of context knowledge as informative priors. An efficient composite cluster sampling algorithm is employed in search of the optimal solution by exploiting both the collaborative and the exclusive relationships between nodes. The proposed framework is evaluated over challenging videos from public datasets, and results show that it can achieve state-of-the-art tracking accuracy.