Applying LSI and data reduction to XML for counter terrorism
Demurjian, S.; Rajasekaran, S.; Ammar, R.; Greenshields, I.; Doan, T.; He, L.
Aerospace Conference, 2006 IEEE
Volume , Issue , 0-0 0 Page(s):11 pp. -
Digital Object Identifier 10.1109/AERO.2006.1656047
Summary:Data reduction is a critical problem for counter-terrorism; large collections of documents must be analyzed and processed, raising issues related to performance, lossless reduction, polysemy (the meaning of individual words being influenced by their surrounding words), and synonymy (the possibility of the same term being described in different ways). In this paper, we begin by presenting a survey of latent semantic indexing (LSI) techniques and strategies. Next, we highlight a subset of LSI software packages that are available (commercially and academically). Then, we explore approaches that apply LSI to eXtensible Markup Language (XML) data. Using this as a basis, the paper proposes an approach that applies LSI and data reduction to XML documents by transitioning from support vector machines (SVM) to random projections to LSI, and also postulates on the exploitation of semantics of Web-based documents that are captured via XML tags
View citation and abstract |