Skip to Main Content
Dataspaces consist of large-scale heterogeneous data. The query interface of accessing tuples should be provided as a fundamental facility by practical dataspace systems. Previously, an efficient index has been proposed for queries with keyword neighborhood over dataspaces. In this paper, we study the materialization and decomposition of dataspaces, in order to improve the query efficiency. First, we study the views of items, which are materialized in order to be reused by queries. When a set of views are materialized, it leads to select some of them as the optimal plan with the minimum query cost. Efficient algorithms are developed for query planning and view generation. Second, we study the partitions of tuples for answering top-k queries. Given a query, we can evaluate the score bounds of the tuples in partitions and prune those partitions with bounds lower than the scores of top-k answers. We also provide theoretical analysis of query cost and prove that the query efficiency cannot be improved by increasing the number of partitions. Finally, we conduct an extensive experimental evaluation to illustrate the superior performance of proposed techniques.