The efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations, such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minimize response time for multidimensional range queries. The grid file is one of the well known access methods for multidimensional and spatial data. We investigate effective and scalable declustering techniques for grid files with the primary goal of minimizing response time and the secondary goal of maximizing the fairness of data distribution. The main contributions of this paper are (1) the analytic and experimental evaluation of existing index-based declustering techniques and their extensions for grid files; and (2) the development of a proximity-based declustering algorithm called `minimax', which is experimentally shown to scale and to consistently achieve better response time compared to available algorithms while maintaining perfect disk distribution
Published in:
Parallel Processing Symposium, 1996., Proceedings of IPPS '96, The 10th International
Date of Conference: 15-19 Apr 1996