By Topic

Using Incremental PLSI for Threshold-Resilient Online Event Analysis

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Tzu-Chuan Chou ; Acad. Sinica Taiwan, Nankang ; Meng Chang Chen

The goal of online event analysis is to detect events and track their associated documents in real time from a continuous stream of documents generated by multiple information sources. Unlike traditional text categorization methods, event analysis approaches consider the temporal relations among documents. However, such methods suffer from the threshold-dependency problem, so they only perform well for a narrow range of thresholds. In addition, if the contents of a document stream change, the optimal threshold (that is, the threshold that yields the best performance) often changes as well. In this paper, we propose a threshold-resilient online algorithm, called the incremental probabilistic latent semantic indexing (IPLSI) algorithm, which alleviates the threshold-dependency problem and simultaneously maintains the continuity of the latent semantics to better capture the story line development of events. The IPLSI algorithm is theoretically sound and empirically efficient and effective for event analysis. The results of the performance evaluation performed on the topic detection and tracking (TDT)-4 corpus show that the algorithm reduces the cost of event analysis by as much as 15 percent ~ 20 percent and increases the acceptable threshold range by 200 percent to 300 percent over the baseline.

Published in:

Knowledge and Data Engineering, IEEE Transactions on  (Volume:20 ,  Issue: 3 )