Skip to Main Content
Parallel processing is the key to speedup performance and to achieve high throughput in processing large scale data analytical workloads. However, failures of nodes involved in the analytical query can interrupt the whole process, resulting in the complete restart of the query if the system does not have query fault-tolerance. Complete restart might be too costly for processing query on very large databases and might not be able to meet the time constraints in decision support systems. In this paper, we present an approach to resume query processing after failure by keeping track of the point at which data has been processed by an operator, called operator tracking. We also consider saving intermediate results using partial materialization. We look at several fundamental parallel database techniques which are widely used today and analyze the performance cost of query processing and recovery using those techniques with our OTPM fault-tolerance approach. We perform simulation-based experiments which show that our approach incurs only a small resume overhead compared to complete pipelining and complete materialization of intermediate results. Also, the combination of our approach with vertically partitioned database in a shared-nothing environment yields the best performance among different settings for parallel processing of data intensive analytical workloads.