By Topic

OTPM: Failure handling in data-intensive analytical processing

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Binh Han ; School of Computer Science, Georgia Institute of Technology, Atlanta, 30332, USA ; Edward Omiecinski ; Leo Mark ; Ling Liu

Parallel processing is the key to speedup performance and to achieve high throughput in processing large scale data analytical workloads. However, failures of nodes involved in the analytical query can interrupt the whole process, resulting in the complete restart of the query if the system does not have query fault-tolerance. Complete restart might be too costly for processing query on very large databases and might not be able to meet the time constraints in decision support systems. In this paper, we present an approach to resume query processing after failure by keeping track of the point at which data has been processed by an operator, called operator tracking. We also consider saving intermediate results using partial materialization. We look at several fundamental parallel database techniques which are widely used today and analyze the performance cost of query processing and recovery using those techniques with our OTPM fault-tolerance approach. We perform simulation-based experiments which show that our approach incurs only a small resume overhead compared to complete pipelining and complete materialization of intermediate results. Also, the combination of our approach with vertically partitioned database in a shared-nothing environment yields the best performance among different settings for parallel processing of data intensive analytical workloads.

Published in:

Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), 2011 7th International Conference on

Date of Conference:

15-18 Oct. 2011