Skip to Main Content
With the explosive growth of big data, data-intensive analysis services need a large data storage and high efficient parallel query processing techniques in cloud using lower-end machines, because most organizations adopt inexpensive low-end clusters. A key challenge is to optimize query processing in such a cloud environment. In this paper, we present a new hybrid data access architecture, called HyDB, for providing data-intensive analysis services, which is featured with a distributed data storage, parallel data access, and query optimization methods. First, we propose a new data partitioning method based on both workloads and co-located resources. The data partitioning method achieves higher consolidation and outperforms the existing approaches. Second, we provide a new parallel access method which includes parallel query processing, optimal query plan generation, and optimal path selection by using a plan tree pruning technique. We have implemented HyDB. Finally, we conduct extensive experimental studies and confirm the efficiency of our HyDB.