Given a parallel or distributed database system, a sophisticated query optimizer requires information on the specific structure of the computing resource, the data distribution, data replication and interconnection bandwidths in order to optimally schedule computational subtasks. Such optimizations are of an even greater importance for next-generation OLAP engines, which attempt to substitute on-demand aggregation into virtual data-cubes and caching for eager preaggregation. This paper discusses the possibilities for query optimization in parallel and distributed OLAP systems, given a detailed description of the underlying computing and storage infrastructure. A modeling framework for the description of the computing resource is introduced, which may be applied to a wide array of database systems. A discussion of using profiling information gathered during the execution of queries to dynamically refine the cost estimates given in the meta-data is also provided
Published in:
Database and Expert Systems Applications, 2001. Proceedings. 12th International Workshop on
Date of Conference: 2001