Indexing useful structural patterns for XML query processing
Wang Lian
Mamoulis, N.
Cheung, D.W.
Yiu, S.M.
Fac. of Inf. Technol., Macao Univ. of Sci. & Technol., China;
This paper appears in: Knowledge and Data Engineering, IEEE Transactions on
Publication Date: July 2005
Volume: 17,
Issue: 7
On page(s): 997- 1009
ISSN: 1041-4347
INSPEC Accession Number: 8475640
Digital Object Identifier: 10.1109/TKDE.2005.110
Current Version Published: 2005-05-23
Abstract
Queries on semistructured data are hard to process due to the complex nature of the data and call for specialized techniques. Existing path-based indexes and query processing algorithms are not efficient for searching complex structures beyond simple paths, even when the queries are high-selective. We introduce the definition of minimal infrequent structures (MIS), which are structures that 1) exist in the data, 2) are not frequent with respect to a support threshold, and 3) all substructures of them are frequent. By indexing the occurrences of MIS, we can efficiently locate the high-selective substructures of a query, improving search performance significantly. An efficient data mining algorithm is proposed, which finds the minimal infrequent structures. Their occurrences in the XML data are then indexed by a lightweight data structure and used as a fast filter step in query evaluation. We validate the efficiency and applicability of our methods through experimentation on both synthetic and real data.
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.