Skip to Main Content
Information retrieval (IR) systems for largescale data collections must build an index in order to provide efficient retrieval that meets the userpsilas needs. In distributed IR systems, query response time is affected by the way in which the data collection is partitioned across nodes. There are three types of collection partitioning; document-based partitioning (called the local index), term-based partitioning (called the global index) and hybrid partitioning. In this paper, we compare the three types of partitioning in terms of average query response time for a system with one broker and six other nodes. Our results showed that within our distributed IR system, the document-based and hybrid partitioning outperformed the term-based partitioning. However, unlike Xi et al. , we did not find that hybrid partitioning was any better than document-based partitioning in terms of average query response time.
Date of Conference: 4-6 Aug. 2008