Skip to Main Content
News event modeling and tracking in the social web is the task of discovering which news events individuals in social communities are most interested in, how much discussion these events generate and tracking these discussions over time. The task could provide informative summaries on what has happened in the real world, yield important knowledge on what are the most important events from the crowd's perspective and reveal their temporal evolutionary trends. Latent Dirichlet Allocation (LDA) has been used intensively for modeling and tracking events (or topics) in text streams. However, the event models discovered by this bottom-up approach have limitations such as a lack of semantic correspondence to real world events. Besides, they do not scale well to large datasets. This paper proposes a novel latent Dirichlet framework for event modeling and tracking. Our approach takes into account ontological knowledge on events that exist in the real world to guide the modeling and tracking processes. Therefore, event models extracted from the social web by our approach are always meaningful and semantically match with real world events. Practically, our approach requires only a single scan over the dataset to model and track events and hence scales well with dataset size.