Efficient R-Tree Based Indexing Scheme for Server-Centric Cloud Storage System | IEEE Journals & Magazine | IEEE Xplore

Efficient R-Tree Based Indexing Scheme for Server-Centric Cloud Storage System


Abstract:

Cloud storage system poses new challenges to the community to support efficient concurrent querying tasks for various data-intensive applications, where indices always ho...Show More

Abstract:

Cloud storage system poses new challenges to the community to support efficient concurrent querying tasks for various data-intensive applications, where indices always hold important positions. In this paper, we explore a practical method to construct a two-layer indexing scheme for multi-dimensional data in diverse server-centric cloud storage system. We first propose RT-HCN, an indexing scheme integrating R-tree based indexing structure and HCN-based routing protocol. RT-HCN organizes storage and compute nodes into an HCN overlay, one of the newly proposed sever-centric data center topologies. Based on the properties of HCN, we design a specific index mapping technique to maintain layered global indices and corresponding query processing algorithms to support efficient query tasks. Then, we expand the idea of RT-HCN onto another server-centric data center topology DCell, discovering a potential generalized and feasible way of deploying two-layer indexing schemes on other server-centric networks. Furthermore, we prove theoretically that RT-HCN is both space-efficient and query-efficient, by which each node actually maintains a tolerable number of global indices while high concurrent queries can be processed within accepted overhead. We finally conduct targeted experiments on Amazon's EC2 platforms, comparing our design with RT-CAN, a similar indexing scheme for traditional P2P network. The results validate the query efficiency, especially the speedup of point query of RT-HCN, depicting its potential applicability in future data centers.
Published in: IEEE Transactions on Knowledge and Data Engineering ( Volume: 28, Issue: 6, 01 June 2016)
Page(s): 1503 - 1517
Date of Publication: 04 February 2016

ISSN Information:

Funding Agency:

No metrics found for this document.

1 Introduction

Cloud storage systems keep gaining attentions from both academia and industry nowadays. From classical systems for general data services, such as Google's GFS [1], Amazon's Dynamo [2], Facebook's Cassandra  [3], to newly designed systems with specialities, such as Haystack  [4], Megastore [5], Spanner  [6], various distributed storage systems were constructed to satisfy the increasing demand of online data-intensive applications that require massive scalability, efficient manageability, reliable availability, and low latency in the storage layer. Many works have been proposed for designing new indexing scheme and data management system to support large-scale data analytical jobs and high concurrent OLTP queries  [7], [8], [9], [10].

No metrics found for this document.
Contact IEEE to Subscribe

References

References is not available for this document.