By Topic

Servicing range queries on multidimensional datasets with partial replicas

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
L. Weng ; Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA ; U. Catalyurek ; T. Kurc ; Gagan Agrawal
more authors

Partial replication is one type of optimization to speed up execution of queries submitted to large datasets. In partial replication, a portion of the dataset is extracted, re-organized, and re-distributed across the storage system. The objective is to reduce the volume of I/O and increase I/O parallelism for different types of queries and for the portions of the dataset that are likely to be accessed frequently. When multiple partial replicas of a dataset exist, query execution plan should be generated so as to use the best combination of subsets of partial replicas (and possibly the original dataset) to minimize query execution time. In this paper, we present a compiler and runtime approach for range queries submitted against distributed scientific datasets. A heuristic algorithm is proposed to choose the set of replicas to reduce query execution. We show the efficiency of the proposed method using datasets and queries in oil reservoir simulation studies on a cluster machine.

Published in:

CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005.  (Volume:2 )

Date of Conference:

9-12 May 2005