Skip to Main Content
This paper proposes a novel approach for querying large-scale XML data using PC cluster system. With the recent spread of the XML format, large-scale data coded in XML ranging from several hundreds of megabytes to several gigabytes has become common. However, XML databases are often innefficient in dealing with huge XML data. The problem is the complexity of the XML data model and query processing. To cope with this problem, we attempt to construct a parallel XML database on top of a PC cluster system. To this end, we discuss XML data partitioning to enable parallel processing of XML queries. We introduce a path-based partitioning for XML data. The obtained XML fragments are then allocated to cluster nodes. To obtain cost-efficient allocation of the fragments, we discuss cost functions for parallel XPath processing and an algorithm to compute pseudo-optimal allocation, which is based on the well-known genetic algorithm. Finally, we demonstrate effectiveness of the proposed scheme by a series of experiments.